| --- |
| license: mit |
| language: |
| - en |
| pipeline_tag: text-generation |
| tags: |
| - transformer |
| - gpt2 |
| - nano |
| - experimental |
| - tiny |
| - text-generation |
| - whirlwindai |
| new_version: WhirlwindAI/AtomZephyr |
| --- |
| |
| <div align="center"> |
|
|
| <img src="https://readme-typing-svg.demolab.com?font=Space+Grotesk&weight=700&size=26&duration=2200&pause=1200&color=F97316¢er=true&vCenter=true&width=760&lines=NanoZephyr;372+Parameters.;Tiny+Enough+to+Fit+Everywhere.;Powered+by+Questionable+Decisions." /> |
|
|
| <br><br> |
|
|
| <img src="https://capsule-render.vercel.app/api?type=venom&height=200&text=NanoZephyr&animation=fadeIn&color=0:F857A6,100:FF5858"/> |
|
|
| <br> |
|
|
| <img src="https://img.shields.io/badge/Parameters-372-F97316?style=for-the-badge"> |
| <img src="https://img.shields.io/badge/Architecture-GPT--2-F59E0B?style=for-the-badge"> |
| <img src="https://img.shields.io/badge/Status-Experimental-FACC15?style=for-the-badge"> |
| <img src="https://img.shields.io/badge/Humor-Included-EAB308?style=for-the-badge"> |
|
|
|
|
| <sub><b>The entire model is smaller than some README files.</b></sub> |
|
|
| </div> |
|
|
| --- |
|
|
| <div align="center"> |
|
|
| # The World's Tiniest Transformer |
|
|
| Big language models chase billions of parameters. |
|
|
| NanoZephyr went the opposite direction. |
|
|
| With only **372 parameters**, this tiny GPT-2 style model explores just how absurdly small a language model can become while still technically generating text. |
|
|
| It's not smart. |
|
|
| It was never supposed to be. |
|
|
| </div> |
|
|
| --- |
|
|
| # Why? |
|
|
| Sometimes research starts with one simple question. |
|
|
| > **"How small can we make a Transformer before it becomes completely ridiculous?"** |
|
|
| NanoZephyr is our answer. |
|
|
| - Built for fun. |
| - Built for curiosity. |
| - Built to make AI researchers laugh. |
|
|
| --- |
|
|
| # Specifications |
|
|
| <div align="center"> |
|
|
| | Property | Value | |
| |:---------|:-----:| |
| | Parameters | **372** | |
| | Architecture | GPT-2 Style | |
| | Layers | 1 | |
| | Attention Heads | 1 | |
| | Embedding Size | 4 | |
| | Feed Forward | 8 | |
| | Vocabulary | 32 Tokens | |
| | Context Length | 16 | |
| | Model Size | ~15 KB | |
| | Training | CPU | |
|
|
| </div> |
|
|
| --- |
|
|
| # Example |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| |
| tokenizer = AutoTokenizer.from_pretrained( |
| "WhirlwindAI/NanoZephyr" |
| ) |
| |
| model = AutoModelForCausalLM.from_pretrained( |
| "WhirlwindAI/NanoZephyr" |
| ) |
| |
| prompt = "The future of AI" |
| |
| inputs = tokenizer(prompt, return_tensors="pt") |
| |
| outputs = model.generate( |
| **inputs, |
| max_length=16, |
| do_sample=True, |
| temperature=2.0 |
| ) |
| |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| ``` |
|
|
| Example output |
|
|
| ```text |
| The future of AI vxbzq rpfm lo... |
| ``` |
|
|
| Beautiful. |
|
|
| --- |
|
|
| # Live System Status |
|
|
| ```text |
| βββββββββββββββββββββββββββββββββ |
| β NanoZephyr Boot Sequence β |
| βββββββββββββββββββββββββββββββββ€ |
| β Parameters : 372 β |
| β GPU Usage : Basically none β |
| β Intelligence : βββββββββ 3% β |
| β Confidence : ββββββββββ 100%β |
| β Randomness : ββββββββββ MAX β |
| β Status : ONLINE β |
| βββββββββββββββββββββββββββββββββ |
| ``` |
|
|
| --- |
|
|
| # Performance |
|
|
| <div align="center"> |
|
|
| | Benchmark | Result | |
| |:----------|-------:| |
| | Common Sense | 0.01 | |
| | Mathematics | 0.00 | |
| | Philosophy | ??? | |
| | Gibberish | 100.00 | |
| | Comedy | β | |
|
|
| </div> |
|
|
| --- |
|
|
| # Sample Outputs |
|
|
| ```text |
| Input: |
| hello |
| |
| Output: |
| helloclvtdzng o |
| ``` |
|
|
| ```text |
| Input: |
| ROMEO: |
| |
| Output: |
| ufbgdyo zia |
| ``` |
|
|
| ```text |
| Input: |
| Once upon a time... |
| |
| Output: |
| qxwwbbvh zjv |
| ``` |
|
|
| Every generation is a surprise. |
|
|
| Sometimes even to the model. |
|
|
| --- |
|
|
| # Intended Uses |
|
|
| β
Learning how Transformers work |
|
|
| β
Educational demos |
|
|
| β
Parameter-count experiments |
|
|
| β
AI memes |
|
|
| β
Making your 70B model feel better |
|
|
| --- |
|
|
| # Not Intended For |
|
|
| β Homework |
|
|
| β Medical advice |
|
|
| β Programming |
|
|
| β Legal documents |
|
|
| β Anything requiring intelligence |
|
|
| --- |
|
|
| # Awards |
|
|
| π₯ Smallest Transformer (probably) |
|
|
| π₯ Highest Gibberish Density |
|
|
| π₯ Lowest Storage Requirement |
|
|
| π₯ Fastest CPU Training |
|
|
| π Most Honest AI Model |
|
|
| --- |
|
|
| # License |
|
|
| MIT |
|
|
| Feel free to use it, modify it, laugh at it, or make it even smaller. |
|
|
| --- |
|
|
| <div align="center"> |
|
|
| <img src="https://readme-typing-svg.demolab.com?font=Fira+Code&weight=600&size=17&duration=1800&pause=1100&color=F97316¢er=true&vCenter=true&width=700&lines=loading+372+parameters...;thinking...;still+thinking...;generated+gibberish.;mission+complete." /> |
|
|
| <br><br> |
|
|
| <img src="https://img.shields.io/badge/Built%20by-WhirlwindAI-F97316?style=for-the-badge"> |
| <img src="https://img.shields.io/badge/Open-Research-F59E0B?style=for-the-badge"> |
| <img src="https://img.shields.io/badge/Experimental-AI-FACC15?style=for-the-badge"> |
|
|
| <br><br> |
|
|
| <img src="https://capsule-render.vercel.app/api?type=waving&height=220&text=372%20PARAMETERS&fontSize=50&animation=twinkling&fontColor=ffffff&color=0:FB923C,50:FACC15,100:FDE68A"/> |
|
|
|
|
| </div> |