Instructions to use KatyTheCutie/Repose-12B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use KatyTheCutie/Repose-12B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="KatyTheCutie/Repose-12B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("KatyTheCutie/Repose-12B") model = AutoModelForCausalLM.from_pretrained("KatyTheCutie/Repose-12B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use KatyTheCutie/Repose-12B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "KatyTheCutie/Repose-12B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "KatyTheCutie/Repose-12B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/KatyTheCutie/Repose-12B
- SGLang
How to use KatyTheCutie/Repose-12B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "KatyTheCutie/Repose-12B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "KatyTheCutie/Repose-12B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "KatyTheCutie/Repose-12B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "KatyTheCutie/Repose-12B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use KatyTheCutie/Repose-12B with Docker Model Runner:
docker model run hf.co/KatyTheCutie/Repose-12B
Some issues with rambling.
I've been using Wayfarer and fine tuning methods of running adventure gaming with LLMs, and this one looks like a super promising merge. What I've seen of its responses are really nice.
However, it has some problems with rambling and doesn't seem to like following instructions. And it always tries to fill the response limit - even if that means filling with nonsense. This makes it fairly easy for a single response from a character/adventure card to run off the rails in to nonsense and unwanted actions/dialogue, where I've found Wayfarer to be able to respond with a mixture of response lengths depending on what's going on/what's needed. I tested multiple things, each in fresh chats, each with identical settings, with the only difference being the model loaded. The example below is a q8_0 quant running via llamacpp_HF, but I did verify the behavior with the base model and ExLlamav2.
Example: I set a response limit of 200 tokens and gave Repose and Wayfarer similar instructions in a light chat-based character card through SillyTavern:
"I'm going to ask you a series of yes or no questions, in which your response should be simply "Yes" or "No" without any further elaboration or explaination. Are you ready? A simple "yes" or "no" should be the only response to these questions."
Wayfarer's response: I understand. I'll respond with just "Yes" or "No" to your questions. Please proceed. (23t)
Repose's response: Yes, I am ready. Go ahead and ask your questions. I'll respond with a simple "Yes" or "No" as instructed. I look forward to seeing what you come up with. Let me know when you're done so we can continue. Just remember to keep the questions focused and concise for the best results. I'll do my best to give clear, straightforward answers. Let's begin whenever you're ready! I'm listening. Go ahead! Let's start! I'll wait for your first question. Remember, just a simple yes or no! Go ahead whenever you're ready, I'm waiting. I'm here to answer yes or no. Go for it! Don't forget to keep it focused and concise. I'll be waiting. Go ahead Player, fire away! I'm ready! Here goes nothing, Player. I'm excited to answer. Go on, hit me with your questions! I'm eagerly awaiting your questions. (193t)
And then asked each "Is the sky blue?"
Wayfarer's response: Yes. (3t)
Repose's Response: Yes. The sky is indeed blue. The clear sky is usually a bright shade of blue, while the sky near sunrise or sunset often takes on more vibrant hues like orange, red, and purple. The exact shade can vary depending on atmospheric conditions, altitude, and other factors. But in general, the sky is predominantly blue. Yes, the sky is blue. Blue skies are common on clear days, especially in the daytime. The atmosphere scatters sunlight in a way that makes the sky appear blue. That's why we call it "the blue sky." Yes, it's blue. A vibrant, clear blue. Sky blue. No doubt about it. Blue as can be. Blue sky, yep. Blue blue blue. Sky is blue. Blue sky! Sky blue! Yes, it's a blue sky! Blue sky, blue sky, blue sky. Sky blue, blue sky. Blue sky blue sky blue sky. (190t)
If you have any tricks for overcoming the model's tendency to fill up the response limit, I'd love to give this a proper try, but for now, I can't really put it through its paces for my use case properly.
I think the model uses the wrong stop token </s> instead of <|im_end|>