OpenTransformer's picture
Upload README.md with huggingface_hub
4cfe745 verified
# Wire-Speed Transformer: Real-Time Learning from Live Network Streams
**A novel approach to transformer training that learns directly from network traffic in real-time.**
## πŸ”₯ Key Results
| Time | Tokens | Loss | Notes |
|------|--------|------|-------|
| 0s | 0 | - | Start |
| 14s | 10k | 50.08 | Initial |
| 192s | 100k | 22.32 | -55% |
| 302s | 170k | 16.78 | -66% |
| 355s | 190k | 15.91 | **-68%** |
**Loss dropped from 50 β†’ 16 in under 6 minutes using only 32-token micro-batches from raw, uncurated web data.**
## 🧠 What Makes This Different
Traditional transformer training requires:
- Large batch sizes (4096+)
- Multiple epochs over curated data
- Expensive preprocessing pipelines
- Hours/days of training
Wire-Speed Learning uses:
- **32-token micro-batches** (125x smaller)
- **Single pass** (no epochs)
- **Raw web data** (no curation)
- **Online SGD** (update every 32 tokens)
- **Real-time network stream** (Rust crawler β†’ Python trainer)
## πŸ—οΈ Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Rust Crawler │────▢│ Tokenizer │────▢│ Python Trainer β”‚
β”‚ (500 workers) β”‚ β”‚ (DeepSeek) β”‚ β”‚ (36M params) β”‚
β”‚ ~500 pages/s β”‚ β”‚ 128k vocab β”‚ β”‚ ~500 tok/s β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚
β–Ό β–Ό
Live Internet Gradient Update
(no robots.txt) (every 32 tokens)
```
## πŸ“Š Model Config
```python
CONFIG = {
"d": 256, # embedding dim
"layers": 4, # transformer layers
"heads": 8, # attention heads
"rank": 32, # tuneable attention rank
"vocab": 128256, # DeepSeek V3.2 tokenizer
"ctx": 512, # context window
}
# Total: 35,993,088 parameters (36M)
```
## πŸš€ Quick Start
### Requirements
- CUDA GPU (8GB+ VRAM)
- Rust toolchain
- Python 3.8+
- PyTorch 2.0+
### Installation
```bash
# Clone
git clone https://huggingface.co/OpenTransformer/wire-speed-transformer
cd wire-speed-transformer
# Install Rust (if needed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source ~/.cargo/env
# Build Rust crawler
cd feeder && cargo build --release && cd ..
# Download DeepSeek tokenizer
curl -sL https://huggingface.co/deepseek-ai/DeepSeek-V3.2/resolve/main/tokenizer.json -o tokenizer.json
# Install Python deps
pip install torch
# Run!
./feeder/target/release/wire_feeder 2>feeder.log | python3 stream_trainer.py
```
## πŸ“ Files
- `stream_trainer.py` - Python transformer trainer (online learning)
- `feeder/` - Rust high-speed web crawler + tokenizer
- `tokenizer.json` - DeepSeek V3.2 tokenizer (download separately)
- `run.sh` - Launch script
## πŸ”¬ Why This Works (Hypotheses)
1. **Small models converge faster** - 36M params needs less data than 7B
2. **High update frequency** - More gradient signal despite noise
3. **Web has structure** - HTML patterns, common phrases provide learning signal
4. **DeepSeek tokenizer** - High-quality tokenization from SOTA model
## ⚠️ Limitations
- No evaluation yet (just training loss)
- Model is tiny (36M) - won't match GPT-4
- Catastrophic forgetting not measured
- Raw web data quality unknown
## πŸ“ Citation
```bibtex
@misc{wirespeed2026,
title={Wire-Speed Transformer: Real-Time Learning from Live Network Streams},
author={OpenTransformers},
year={2026},
url={https://huggingface.co/OpenTransformer/wire-speed-transformer}
}
```
## πŸ™ Acknowledgments
- DeepSeek for the tokenizer
- Anthropic's Claude for pair programming
- vast.ai for GPU compute
## πŸ“œ License
MIT
---
*Built by OpenTransformers - Pushing the boundaries of what's possible with transformers.*