A jokerfied Sam Altman holds a sign that says "Kirk is Sin". Besides him, some text: "Counting or not counting inference costs?"

Kirk Ballerina is the second installment in the Kirk series of tiny language models with fully license-compliant and ethical data provenance. At nearly double the size, both in terms of its parameters and its training dataset, we expect noticeable improvements in performance.

Unlike other language models, Kirk is not designed around safe or useful output. Instead, it is built around the creation of interesting and humorous content. Taboo content and internet community spaces are overrepresented within its training dataset. Additionally, we have been careful to introduce as little synthetic (LLM-generated) data as possible.

Benchmarks

Ballerina has middling benchmark performance when compared to other models of its size due to the restrictions on its input dataset, but it still shows improvements with compared to its predecessors.

Benchmark (0-shot) Value
ARC-Easy 32.91%
BLiMP 77.14%
HellaSwag 26.40%
WikiText-2 (byte) 2.23

Technical details

Kirk Ballerina was trained on ~2.5 billion tokens over the course of 15 hours on an A100 GPU.

Name Value
Architecture Llama
Context Length 1028
Vocab size 32606
RoPE theta 10000
num_attention_heads = 16
num_key_value_heads = 2
num_hidden_layers = 24
hidden_size = 512
intermediate_size = 1728

tie_word_embeddings = True

Training hyperparameters

Name Value
Per-device batch size 12
Learning rate 5e-4
Learning scheduler Cosine
Weight decay 0.1
Epochs 1
Gradient accumulation steps 8

Example completions

Prompt: "Once upon a time,"

Once upon a time, I felt my mother's voice again. She looked at me with a sad face and said,

"I'm sorry, but I'm afraid of you.""I see, my poor mother. I'm sorry." She looked at me, "My mother."

"My mother's?"

"My mother's."

"Do you know how to speak? You haven't been around with me for a long time."

"I... I don't know. I can't. I've been alone for days." She shook her head. "I can't be. I can't be. I don't know. I don't know where I'm going."

"I... I can't," I said, "I don't know how to speak."

"I think I'll be fine. I'm sure that I'll be fine. My parents will always be here, and I don't want to be around them anymore. I don't know how to talk."

"Yes?" I said, "I don't know what's happening. I don't know how to speak."

"I don't know what's going on." I said, "I don't know how to speak. I'm afraid

Prompt: "Artificial intelligence is"

Artificial intelligence is important. So, I have no idea how to fix this, but it would be a very important problem for me to do. For example, in the last two games, you are likely to have a computer that fits right in your face.

Also, I tried to explain to you, how you can access the game and how to play, and that's how I did, so I should go back to that. I also have a problem with this.

Anyway, I did a little research on this, which I am now thinking about. I have to agree with you about that, I have no idea how to level up the game. I hope you find some better ways to fix it.

I'm still thinking about the possibility of playing a game or something. I'm sure it could work and I hope I can. Thanks again.

-Vincent

Category:Blog posts

How to use

The easiest way to use Kirk Ballerina programatically is with a pipeline in HF transformers.

$ pip install transformers torch
import torch
from transformers import pipeline

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
pipe = pipeline(
    "text-generation", model="rtc2022/kirk-ballerina", device=device
)

Support the creator

Support the creation of more models like this one by directly supporting the main creator:

Support Lopes on Ko-Fi

Downloads last month
19
Safetensors
Model size
94.6M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train rtc2022/kirk-ballerina

Space using rtc2022/kirk-ballerina 1