Instructions to use rtc2022/kirk-ballerina with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use rtc2022/kirk-ballerina with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="rtc2022/kirk-ballerina")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("rtc2022/kirk-ballerina") model = AutoModelForMultimodalLM.from_pretrained("rtc2022/kirk-ballerina") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use rtc2022/kirk-ballerina with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "rtc2022/kirk-ballerina" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rtc2022/kirk-ballerina", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/rtc2022/kirk-ballerina
- SGLang
How to use rtc2022/kirk-ballerina with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "rtc2022/kirk-ballerina" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rtc2022/kirk-ballerina", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "rtc2022/kirk-ballerina" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rtc2022/kirk-ballerina", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use rtc2022/kirk-ballerina with Docker Model Runner:
docker model run hf.co/rtc2022/kirk-ballerina
Kirk Ballerina is the second installment in the Kirk series of tiny language models with fully license-compliant and ethical data provenance. At nearly double the size, both in terms of its parameters and its training dataset, we expect noticeable improvements in performance.
Unlike other language models, Kirk is not designed around safe or useful output. Instead, it is built around the creation of interesting and humorous content. Taboo content and internet community spaces are overrepresented within its training dataset. Additionally, we have been careful to introduce as little synthetic (LLM-generated) data as possible.
Benchmarks
Ballerina has middling benchmark performance when compared to other models of its size due to the restrictions on its input dataset, but it still shows improvements with compared to its predecessors.
| Benchmark (0-shot) | Value |
|---|---|
| ARC-Easy | 32.91% |
| BLiMP | 77.14% |
| HellaSwag | 26.40% |
| WikiText-2 (byte) | 2.23 |
Technical details
Kirk Ballerina was trained on ~2.5 billion tokens over the course of 15 hours on an A100 GPU.
| Name | Value |
|---|---|
| Architecture | Llama |
| Context Length | 1028 |
| Vocab size | 32606 |
| RoPE theta | 10000 |
num_attention_heads = 16
num_key_value_heads = 2
num_hidden_layers = 24
hidden_size = 512
intermediate_size = 1728
tie_word_embeddings = True
Training hyperparameters
| Name | Value |
|---|---|
| Per-device batch size | 12 |
| Learning rate | 5e-4 |
| Learning scheduler | Cosine |
| Weight decay | 0.1 |
| Epochs | 1 |
| Gradient accumulation steps | 8 |
Example completions
Prompt: "Once upon a time,"
Once upon a time, I felt my mother's voice again. She looked at me with a sad face and said,
"I'm sorry, but I'm afraid of you.""I see, my poor mother. I'm sorry." She looked at me, "My mother."
"My mother's?"
"My mother's."
"Do you know how to speak? You haven't been around with me for a long time."
"I... I don't know. I can't. I've been alone for days." She shook her head. "I can't be. I can't be. I don't know. I don't know where I'm going."
"I... I can't," I said, "I don't know how to speak."
"I think I'll be fine. I'm sure that I'll be fine. My parents will always be here, and I don't want to be around them anymore. I don't know how to talk."
"Yes?" I said, "I don't know what's happening. I don't know how to speak."
"I don't know what's going on." I said, "I don't know how to speak. I'm afraid
Prompt: "Artificial intelligence is"
Artificial intelligence is important. So, I have no idea how to fix this, but it would be a very important problem for me to do. For example, in the last two games, you are likely to have a computer that fits right in your face.
Also, I tried to explain to you, how you can access the game and how to play, and that's how I did, so I should go back to that. I also have a problem with this.
Anyway, I did a little research on this, which I am now thinking about. I have to agree with you about that, I have no idea how to level up the game. I hope you find some better ways to fix it.
I'm still thinking about the possibility of playing a game or something. I'm sure it could work and I hope I can. Thanks again.
-Vincent
Category:Blog posts
How to use
The easiest way to use Kirk Ballerina programatically is with a pipeline in HF transformers.
$ pip install transformers torch
import torch
from transformers import pipeline
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
pipe = pipeline(
"text-generation", model="rtc2022/kirk-ballerina", device=device
)
Support the creator
Support the creation of more models like this one by directly supporting the main creator:
- Downloads last month
- 19
