| --- |
| license: mit |
| language: |
| - en |
| pipeline_tag: text-generation |
| tags: |
| - transformer |
| - gpt2 |
| - tiny |
| - atom |
| - experimental |
| - humor |
| - whirlwindai |
| new_version: WhirlwindAI/SubatomZephyr |
| --- |
| |
| <div align="center"> |
|
|
| <img src="https://readme-typing-svg.demolab.com?font=Space+Grotesk&weight=700&size=27&duration=1900&pause=900&color=22C55E¢er=true&vCenter=true&width=850&lines=AtomZephyr;27+Parameters.;Almost+an+LLM.;Mostly+a+Science+Experiment.;Powered+by+Pure+Curiosity." /> |
|
|
| <br> |
|
|
| <img src="https://img.shields.io/badge/Parameters-27-22C55E?style=for-the-badge"> |
| <img src="https://img.shields.io/badge/Architecture-GPT--2-10B981?style=for-the-badge"> |
| <img src="https://img.shields.io/badge/Status-Experimental-14B8A6?style=for-the-badge"> |
| <img src="https://img.shields.io/badge/Braincells-27-06B6D4?style=for-the-badge"> |
|
|
| <br><br> |
|
|
| <img src="https://capsule-render.vercel.app/api?type=soft&height=190&text=AtomZephyr&fontColor=ffffff&fontSize=48&animation=blinking&color=0:22C55E,100:06B6D4"/> |
|
|
| </div> |
|
|
| --- |
|
|
| # The Idea |
|
|
| <div align="center"> |
|
|
| <table width="92%"> |
|
|
| <tr> |
|
|
| <td align="center"> |
|
|
| ## What if a transformer became... microscopic? |
|
|
| AtomZephyr explores one of the smallest practical transformer architectures ever built. |
|
|
| Not because anyone asked for it. |
|
|
| Because someone eventually had to answer the question: |
|
|
| **"How absurdly small can an AI become before it forgets how to AI?"** |
|
|
| Turns out... |
|
|
| **27 parameters is still technically enough.** |
|
|
| </td> |
|
|
| </tr> |
|
|
| </table> |
|
|
| </div> |
|
|
| --- |
|
|
| # Why? |
|
|
| Most AI models compete by getting bigger. |
|
|
| AtomZephyr competes by removing parameters until people start questioning whether it's still a neural network. |
|
|
| Every parameter had to earn its place. |
|
|
| Most didn't. |
|
|
| --- |
|
|
| # Specifications |
|
|
| | Property | Value | |
| |-----------|-------| |
| | Parameters | **27** | |
| | Architecture | GPT-2 | |
| | Layers | 1 | |
| | Attention Heads | 1 | |
| | Embedding Size | 1 | |
| | FFN Size | 1 | |
| | Context Length | 4 | |
| | Vocabulary | 5 Tokens | |
| | Model Size | <5 KB | |
| | Training Time | ~6 Seconds (CPU) | |
|
|
| --- |
|
|
| # Performance |
|
|
| | Test | Result | |
| |------|--------| |
| | Understand English | β | |
| | Write Code | β | |
| | Solve Math | β | |
| | Generate "abba" | β
| |
| | Break Expectations | β
| |
|
|
| --- |
|
|
| # Quick Start |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| |
| tokenizer = AutoTokenizer.from_pretrained("WhirlwindAI/AtomZephyr") |
| model = AutoModelForCausalLM.from_pretrained("WhirlwindAI/AtomZephyr") |
| |
| prompt = "a" |
| |
| inputs = tokenizer(prompt, return_tensors="pt") |
| |
| outputs = model.generate( |
| **inputs, |
| do_sample=True, |
| temperature=1.7, |
| max_length=4 |
| ) |
| |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| ``` |
|
|
| Possible output |
|
|
| ``` |
| abaa |
| ``` |
|
|
| Groundbreaking. |
|
|
| --- |
|
|
| # Example Conversation |
|
|
| **User** |
|
|
| > Tell me a joke. |
|
|
| **AtomZephyr** |
|
|
| ``` |
| abba |
| ``` |
|
|
| Technically... |
|
|
| that's an answer. |
|
|
| --- |
|
|
| # Scientific Achievement |
|
|
| Removing parameters is easy. |
|
|
| Keeping a transformer alive afterwards... |
|
|
| isn't. |
|
|
| AtomZephyr exists purely to explore the absolute lower limits of transformer architectures while remaining a real, trainable language model. |
|
|
| Whether it's useful is a completely different discussion. |
|
|
| --- |
|
|
| # Awards |
|
|
| π₯ Smallest Model That Still Has Self-Respect |
|
|
| π Best Binary Poetry Generator |
|
|
| π₯ Most Efficient Waste Of Six Seconds |
|
|
| ποΈ Official Representative Of Tiny AI |
|
|
| --- |
|
|
| # Limitations |
|
|
| AtomZephyr should **not** be used for: |
|
|
| - Programming |
| - Translation |
| - Question Answering |
| - Homework |
| - Anything important |
|
|
| It performs significantly better when asked to do absolutely nothing useful. |
|
|
| --- |
|
|
| # Fun Facts |
|
|
| - Fits inside most PNG images. |
| - Smaller than many neural network tutorials. |
| - Downloads faster than this README loads. |
| - Has fewer parameters than some calculator manuals have pages. |
|
|
| --- |
|
|
| # License |
|
|
| MIT |
|
|
| Take it apart. |
|
|
| Make it smaller. |
|
|
| Break another record. |
|
|
| --- |
|
|
| <div align="center"> |
|
|
| ### Built by WhirlwindAI |
|
|
| *"Sometimes progress isn't measured in billions... it's measured in what you can remove."* |
|
|
| <br> |
|
|
| <img src="https://capsule-render.vercel.app/api?type=waving&height=120§ion=footer&color=0:22C55E,100:06B6D4"/> |
|
|
| </div> |