| # Wire-Speed Transformer: Real-Time Learning from Live Network Streams | |
| **A novel approach to transformer training that learns directly from network traffic in real-time.** | |
| ## π₯ Key Results | |
| | Time | Tokens | Loss | Notes | | |
| |------|--------|------|-------| | |
| | 0s | 0 | - | Start | | |
| | 14s | 10k | 50.08 | Initial | | |
| | 192s | 100k | 22.32 | -55% | | |
| | 302s | 170k | 16.78 | -66% | | |
| | 355s | 190k | 15.91 | **-68%** | | |
| **Loss dropped from 50 β 16 in under 6 minutes using only 32-token micro-batches from raw, uncurated web data.** | |
| ## π§ What Makes This Different | |
| Traditional transformer training requires: | |
| - Large batch sizes (4096+) | |
| - Multiple epochs over curated data | |
| - Expensive preprocessing pipelines | |
| - Hours/days of training | |
| Wire-Speed Learning uses: | |
| - **32-token micro-batches** (125x smaller) | |
| - **Single pass** (no epochs) | |
| - **Raw web data** (no curation) | |
| - **Online SGD** (update every 32 tokens) | |
| - **Real-time network stream** (Rust crawler β Python trainer) | |
| ## ποΈ Architecture | |
| ``` | |
| βββββββββββββββββββ ββββββββββββββββ βββββββββββββββββββ | |
| β Rust Crawler ββββββΆβ Tokenizer ββββββΆβ Python Trainer β | |
| β (500 workers) β β (DeepSeek) β β (36M params) β | |
| β ~500 pages/s β β 128k vocab β β ~500 tok/s β | |
| βββββββββββββββββββ ββββββββββββββββ βββββββββββββββββββ | |
| β β | |
| βΌ βΌ | |
| Live Internet Gradient Update | |
| (no robots.txt) (every 32 tokens) | |
| ``` | |
| ## π Model Config | |
| ```python | |
| CONFIG = { | |
| "d": 256, # embedding dim | |
| "layers": 4, # transformer layers | |
| "heads": 8, # attention heads | |
| "rank": 32, # tuneable attention rank | |
| "vocab": 128256, # DeepSeek V3.2 tokenizer | |
| "ctx": 512, # context window | |
| } | |
| # Total: 35,993,088 parameters (36M) | |
| ``` | |
| ## π Quick Start | |
| ### Requirements | |
| - CUDA GPU (8GB+ VRAM) | |
| - Rust toolchain | |
| - Python 3.8+ | |
| - PyTorch 2.0+ | |
| ### Installation | |
| ```bash | |
| # Clone | |
| git clone https://huggingface.co/OpenTransformer/wire-speed-transformer | |
| cd wire-speed-transformer | |
| # Install Rust (if needed) | |
| curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y | |
| source ~/.cargo/env | |
| # Build Rust crawler | |
| cd feeder && cargo build --release && cd .. | |
| # Download DeepSeek tokenizer | |
| curl -sL https://huggingface.co/deepseek-ai/DeepSeek-V3.2/resolve/main/tokenizer.json -o tokenizer.json | |
| # Install Python deps | |
| pip install torch | |
| # Run! | |
| ./feeder/target/release/wire_feeder 2>feeder.log | python3 stream_trainer.py | |
| ``` | |
| ## π Files | |
| - `stream_trainer.py` - Python transformer trainer (online learning) | |
| - `feeder/` - Rust high-speed web crawler + tokenizer | |
| - `tokenizer.json` - DeepSeek V3.2 tokenizer (download separately) | |
| - `run.sh` - Launch script | |
| ## π¬ Why This Works (Hypotheses) | |
| 1. **Small models converge faster** - 36M params needs less data than 7B | |
| 2. **High update frequency** - More gradient signal despite noise | |
| 3. **Web has structure** - HTML patterns, common phrases provide learning signal | |
| 4. **DeepSeek tokenizer** - High-quality tokenization from SOTA model | |
| ## β οΈ Limitations | |
| - No evaluation yet (just training loss) | |
| - Model is tiny (36M) - won't match GPT-4 | |
| - Catastrophic forgetting not measured | |
| - Raw web data quality unknown | |
| ## π Citation | |
| ```bibtex | |
| @misc{wirespeed2026, | |
| title={Wire-Speed Transformer: Real-Time Learning from Live Network Streams}, | |
| author={OpenTransformers}, | |
| year={2026}, | |
| url={https://huggingface.co/OpenTransformer/wire-speed-transformer} | |
| } | |
| ``` | |
| ## π Acknowledgments | |
| - DeepSeek for the tokenizer | |
| - Anthropic's Claude for pair programming | |
| - vast.ai for GPU compute | |
| ## π License | |
| MIT | |
| --- | |
| *Built by OpenTransformers - Pushing the boundaries of what's possible with transformers.* | |