| --- |
| license: mit |
| language: |
| - en |
| pipeline_tag: text-generation |
| tags: |
| - transformer |
| - gpt2 |
| - smallest |
| - experimental |
| - humor |
| - whirlwindai |
| - subatomic |
| --- |
| |
| <div align="center"> |
|
|
| <img src="https://readme-typing-svg.demolab.com?font=Space+Grotesk&weight=700&size=27&duration=1800&pause=900&color=EC4899¢er=true&vCenter=true&width=900&lines=SubatomZephyr;21+Parameters.;The+Entire+Model+Fits+In+A+Sentence.;Physics+Would+Like+A+Word.;Somehow+Still+A+Transformer." /> |
|
|
| <br> |
|
|
| <img src="https://img.shields.io/badge/Parameters-21-EC4899?style=for-the-badge"> |
| <img src="https://img.shields.io/badge/Architecture-GPT--2-F472B6?style=for-the-badge"> |
| <img src="https://img.shields.io/badge/Context-1_Token-F9A8D4?style=for-the-badge"> |
| <img src="https://img.shields.io/badge/IQ-21-E879F9?style=for-the-badge"> |
|
|
| <br><br> |
|
|
| <img src="https://capsule-render.vercel.app/api?type=transparent&height=200&text=SubatomZephyr&animation=scaleIn&fontColor=ffffff&fontSize=46&color=0:EC4899,100:8B5CF6"/> |
|
|
| </div> |
|
|
| --- |
|
|
| # The Idea |
|
|
| <div align="center"> |
|
|
| <table width="92%"> |
|
|
| <tr> |
|
|
| <td align="center"> |
|
|
| ## TinyZephyr wasn't tiny enough. |
|
|
| NanoZephyr wasn't tiny enough. |
|
|
| AtomZephyr still had **too many** parameters. |
|
|
| So we continued removing neurons until the model reached a point where modern physics politely asked us to stop. |
|
|
| We ignored them. |
|
|
| **SubatomZephyr** is the result. |
|
|
| </td> |
|
|
| </tr> |
|
|
| </table> |
|
|
| </div> |
|
|
| --- |
|
|
| # Why? |
|
|
| Most research asks |
|
|
| > "How can we make models smarter?" |
|
|
| We asked |
|
|
| > **"How many parameters can we delete before Git starts feeling sorry for us?"** |
|
|
| This repository is the answer. |
|
|
| --- |
|
|
| # Specifications |
|
|
| | Property | Value | |
| |-----------|-------| |
| | Parameters | **21** | |
| | Architecture | GPT-2 | |
| | Layers | 1 | |
| | Attention Heads | 1 | |
| | Embedding Size | 1 | |
| | Context Length | 1 | |
| | Vocabulary | 2 Tokens | |
| | Disk Size | <2 KB | |
| | Training Time | ~20 Seconds | |
|
|
| --- |
|
|
| # Performance |
|
|
| | Task | Result | |
| |------|--------| |
| | Copy "a" | ✅ | |
| | Copy "b" | ✅ | |
| | Understand Humans | ❌ | |
| | Understand Itself | ❌ | |
| | Break Records | ✅ | |
|
|
| --- |
|
|
| # Quick Start |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| |
| tokenizer = AutoTokenizer.from_pretrained("WhirlwindAI/SubatomZephyr") |
| model = AutoModelForCausalLM.from_pretrained("WhirlwindAI/SubatomZephyr") |
| |
| prompt = "a" |
| |
| inputs = tokenizer(prompt, return_tensors="pt") |
| |
| outputs = model.generate( |
| **inputs, |
| max_length=2, |
| do_sample=True, |
| temperature=2.0 |
| ) |
| |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| ``` |
|
|
| Output |
|
|
| ``` |
| a |
| ``` |
|
|
| Peak artificial intelligence. |
|
|
| --- |
|
|
| # Example Conversation |
|
|
| **User** |
|
|
| > Tell me a story. |
|
|
| **SubatomZephyr** |
|
|
| ``` |
| a |
| ``` |
|
|
| Oscar-worthy. |
|
|
| --- |
|
|
| # Scientific Explanation |
|
|
| SubatomZephyr doesn't generate language. |
|
|
| It doesn't reason. |
|
|
| It doesn't predict. |
|
|
| It doesn't even pretend anymore. |
|
|
| It has mastered exactly **one** skill: |
|
|
| ``` |
| Input: |
| |
| a |
| |
| Output: |
| |
| a |
| ``` |
|
|
| 100% accuracy. |
|
|
| Zero creativity. |
|
|
| Perfect confidence. |
|
|
| --- |
|
|
| # World Records |
|
|
| 🥇 Smallest Generative Transformer |
|
|
| 🏆 Highest Accuracy On The Letter "a" |
|
|
| 🥈 Lowest Grocery Bill (21 Parameters) |
|
|
| 🥉 First Model Smaller Than Most README Files |
|
|
| 🎖️ Certified Quantum Intelligence™ |
|
|
| --- |
|
|
| # Benchmarks |
|
|
| ``` |
| MMLU : 💀 |
| |
| HumanEval : 😂 |
| |
| TruthfulQA : 🤨 |
| |
| Binary Copy : 🏆 100% |
| |
| Entertainment : ⭐⭐⭐⭐⭐ |
| ``` |
|
|
| --- |
|
|
| # Frequently Asked Questions |
|
|
| ### Is this useful? |
|
|
| No. |
|
|
| ### Is this funny? |
|
|
| Hopefully. |
|
|
| ### Why does it exist? |
|
|
| Curiosity. |
|
|
| ### Can it beat GPT-4? |
|
|
| Only if the task is copying the letter "a". |
|
|
| ### What's next? |
|
|
| **QuarkZephyr.** |
|
|
| Probably. |
|
|
| --- |
|
|
| # Fun Facts |
|
|
| - Smaller than many favicon files. |
|
|
| - Downloads before you click download. |
|
|
| - Has fewer parameters than this README has paragraphs. |
|
|
| - The tokenizer is more complicated than the model. |
|
|
| - Uses more electricity displaying this README than running inference. |
|
|
| --- |
|
|
| # Limitations |
|
|
| SubatomZephyr should not be used for: |
|
|
| - Chatbots |
|
|
| - Coding |
|
|
| - Translation |
|
|
| - Math |
|
|
| - Science |
|
|
| - Existing in production |
|
|
| It excels primarily at making ML engineers laugh. |
|
|
| --- |
|
|
| # License |
|
|
| MIT |
|
|
| If you somehow improve this model... |
|
|
| please tell us. |
|
|
| We're genuinely curious. |
|
|
| --- |
|
|
| <div align="center"> |
|
|
| ## Built by WhirlwindAI |
|
|
| *"When there are no parameters left to remove... remove expectations instead."* |
|
|
| <br> |
|
|
| <img src="https://capsule-render.vercel.app/api?type=waving&height=130§ion=footer&color=0:EC4899,100:8B5CF6"/> |
|
|
| </div> |