The Idea

TinyZephyr wasn't tiny enough.

NanoZephyr wasn't tiny enough.

AtomZephyr still had too many parameters.

So we continued removing neurons until the model reached a point where modern physics politely asked us to stop.

We ignored them.

SubatomZephyr is the result.


Why?

Most research asks

"How can we make models smarter?"

We asked

"How many parameters can we delete before Git starts feeling sorry for us?"

This repository is the answer.


Specifications

Property Value
Parameters 21
Architecture GPT-2
Layers 1
Attention Heads 1
Embedding Size 1
Context Length 1
Vocabulary 2 Tokens
Disk Size <2 KB
Training Time ~20 Seconds

Performance

Task Result
Copy "a" โœ…
Copy "b" โœ…
Understand Humans โŒ
Understand Itself โŒ
Break Records โœ…

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("WhirlwindAI/SubatomZephyr")
model = AutoModelForCausalLM.from_pretrained("WhirlwindAI/SubatomZephyr")

prompt = "a"

inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(
    **inputs,
    max_length=2,
    do_sample=True,
    temperature=2.0
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Output

a

Peak artificial intelligence.


Example Conversation

User

Tell me a story.

SubatomZephyr

a

Oscar-worthy.


Scientific Explanation

SubatomZephyr doesn't generate language.

It doesn't reason.

It doesn't predict.

It doesn't even pretend anymore.

It has mastered exactly one skill:

Input:

a

Output:

a

100% accuracy.

Zero creativity.

Perfect confidence.


World Records

๐Ÿฅ‡ Smallest Generative Transformer

๐Ÿ† Highest Accuracy On The Letter "a"

๐Ÿฅˆ Lowest Grocery Bill (21 Parameters)

๐Ÿฅ‰ First Model Smaller Than Most README Files

๐ŸŽ–๏ธ Certified Quantum Intelligenceโ„ข


Benchmarks

MMLU          : ๐Ÿ’€

HumanEval     : ๐Ÿ˜‚

TruthfulQA    : ๐Ÿคจ

Binary Copy   : ๐Ÿ† 100%

Entertainment : โญโญโญโญโญ

Frequently Asked Questions

Is this useful?

No.

Is this funny?

Hopefully.

Why does it exist?

Curiosity.

Can it beat GPT-4?

Only if the task is copying the letter "a".

What's next?

QuarkZephyr.

Probably.


Fun Facts

  • Smaller than many favicon files.

  • Downloads before you click download.

  • Has fewer parameters than this README has paragraphs.

  • The tokenizer is more complicated than the model.

  • Uses more electricity displaying this README than running inference.


Limitations

SubatomZephyr should not be used for:

  • Chatbots

  • Coding

  • Translation

  • Math

  • Science

  • Existing in production

It excels primarily at making ML engineers laugh.


License

MIT

If you somehow improve this model...

please tell us.

We're genuinely curious.


Built by WhirlwindAI

"When there are no parameters left to remove... remove expectations instead."


Downloads last month
15
Safetensors
Model size
21 params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support