Kirk Ballerina is the second installment in the Kirk series of tiny language models with fully license-compliant and ethical data provenance. At nearly double the size, both in terms of its parameters and its training dataset, we expect noticeable improvements in performance.

Unlike other language models, Kirk is not designed around safe or useful output. Instead, it is built around the creation of interesting and humorous content. Taboo content and internet community spaces are overrepresented within its training dataset. Additionally, we have been careful to introduce as little synthetic (LLM-generated) data as possible.

Benchmarks

Ballerina has middling benchmark performance when compared to other models of its size due to the restrictions on its input dataset, but it still shows improvements with compared to its predecessors.

Benchmark (0-shot)	Value
ARC-Easy	32.91%
BLiMP	77.14%
HellaSwag	26.40%
WikiText-2 (byte)	2.23

Technical details

Kirk Ballerina was trained on ~2.5 billion tokens over the course of 15 hours on an A100 GPU.

Name	Value
Architecture	Llama
Context Length	1028
Vocab size	32606
RoPE theta	10000

num_attention_heads = 16
num_key_value_heads = 2
num_hidden_layers = 24
hidden_size = 512
intermediate_size = 1728

tie_word_embeddings = True

Training hyperparameters

Name	Value
Per-device batch size	12
Learning rate	5e-4
Learning scheduler	Cosine
Weight decay	0.1
Epochs	1
Gradient accumulation steps	8

Example completions

Prompt: "Once upon a time,"

Once upon a time, I felt my mother's voice again. She looked at me with a sad face and said,

"I'm sorry, but I'm afraid of you.""I see, my poor mother. I'm sorry." She looked at me, "My mother."

"My mother's?"

"My mother's."

"Do you know how to speak? You haven't been around with me for a long time."

"I... I don't know. I can't. I've been alone for days." She shook her head. "I can't be. I can't be. I don't know. I don't know where I'm going."

"I... I can't," I said, "I don't know how to speak."

"I think I'll be fine. I'm sure that I'll be fine. My parents will always be here, and I don't want to be around them anymore. I don't know how to talk."

"Yes?" I said, "I don't know what's happening. I don't know how to speak."

"I don't know what's going on." I said, "I don't know how to speak. I'm afraid

Prompt: "Artificial intelligence is"

Artificial intelligence is important. So, I have no idea how to fix this, but it would be a very important problem for me to do. For example, in the last two games, you are likely to have a computer that fits right in your face.

Also, I tried to explain to you, how you can access the game and how to play, and that's how I did, so I should go back to that. I also have a problem with this.

Anyway, I did a little research on this, which I am now thinking about. I have to agree with you about that, I have no idea how to level up the game. I hope you find some better ways to fix it.

I'm still thinking about the possibility of playing a game or something. I'm sure it could work and I hope I can. Thanks again.

-Vincent

Category:Blog posts

How to use

The easiest way to use Kirk Ballerina programatically is with a pipeline in HF transformers.

$ pip install transformers torch

import torch
from transformers import pipeline

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
pipe = pipeline(
    "text-generation", model="rtc2022/kirk-ballerina", device=device
)

Support the creator

Support the creation of more models like this one by directly supporting the main creator:

Support Lopes on Ko-Fi

Downloads last month: 19

Safetensors

Model size

94.6M params

Tensor type

F32

rtc2022
/

kirk-ballerina