OpenTransformer
/

wire-speed-transformer

Model card Files Files and versions

xet

Community

OpenTransformer commited on Jan 16

Commit

4cfe745

verified ·

1 Parent(s): 31554a2

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +136 -0

README.md ADDED Viewed

	@@ -0,0 +1,136 @@

+# Wire-Speed Transformer: Real-Time Learning from Live Network Streams
+**A novel approach to transformer training that learns directly from network traffic in real-time.**
+## 🔥 Key Results
+| Time | Tokens | Loss | Notes |
+|------|--------|------|-------|
+| 0s | 0 | - | Start |
+| 14s | 10k | 50.08 | Initial |
+| 192s | 100k | 22.32 | -55% |
+| 302s | 170k | 16.78 | -66% |
+| 355s | 190k | 15.91 | **-68%** |
+**Loss dropped from 50 → 16 in under 6 minutes using only 32-token micro-batches from raw, uncurated web data.**
+## 🧠 What Makes This Different
+Traditional transformer training requires:
+- Large batch sizes (4096+)
+- Multiple epochs over curated data
+- Expensive preprocessing pipelines
+- Hours/days of training
+Wire-Speed Learning uses:
+- **32-token micro-batches** (125x smaller)
+- **Single pass** (no epochs)
+- **Raw web data** (no curation)
+- **Online SGD** (update every 32 tokens)
+- **Real-time network stream** (Rust crawler → Python trainer)
+## 🏗️ Architecture
+```
+┌─────────────────┐     ┌──────────────┐     ┌─────────────────┐
+│  Rust Crawler   │────▶│  Tokenizer   │────▶│ Python Trainer  │
+│  (500 workers)  │     │ (DeepSeek)   │     │  (36M params)   │
+│  ~500 pages/s   │     │  128k vocab  │     │  ~500 tok/s     │
+└─────────────────┘     └──────────────┘     └─────────────────┘
+         │                                           │
+         ▼                                           ▼
+   Live Internet                              Gradient Update
+   (no robots.txt)                            (every 32 tokens)
+```
+## 📊 Model Config
+```python
+CONFIG = {
+    "d": 256,        # embedding dim
+    "layers": 4,     # transformer layers
+    "heads": 8,      # attention heads
+    "rank": 32,      # tuneable attention rank
+    "vocab": 128256, # DeepSeek V3.2 tokenizer
+    "ctx": 512,      # context window
+}
+# Total: 35,993,088 parameters (36M)
+```
+## 🚀 Quick Start
+### Requirements
+- CUDA GPU (8GB+ VRAM)
+- Rust toolchain
+- Python 3.8+
+- PyTorch 2.0+
+### Installation
+```bash
+# Clone
+git clone https://huggingface.co/OpenTransformer/wire-speed-transformer
+cd wire-speed-transformer
+# Install Rust (if needed)
+curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
+source ~/.cargo/env
+# Build Rust crawler
+cd feeder && cargo build --release && cd ..
+# Download DeepSeek tokenizer
+curl -sL https://huggingface.co/deepseek-ai/DeepSeek-V3.2/resolve/main/tokenizer.json -o tokenizer.json
+# Install Python deps
+pip install torch
+# Run!
+./feeder/target/release/wire_feeder 2>feeder.log | python3 stream_trainer.py
+```
+## 📁 Files
+- `stream_trainer.py` - Python transformer trainer (online learning)
+- `feeder/` - Rust high-speed web crawler + tokenizer
+- `tokenizer.json` - DeepSeek V3.2 tokenizer (download separately)
+- `run.sh` - Launch script
+## 🔬 Why This Works (Hypotheses)
+1. **Small models converge faster** - 36M params needs less data than 7B
+2. **High update frequency** - More gradient signal despite noise
+3. **Web has structure** - HTML patterns, common phrases provide learning signal
+4. **DeepSeek tokenizer** - High-quality tokenization from SOTA model
+## ⚠️ Limitations
+- No evaluation yet (just training loss)
+- Model is tiny (36M) - won't match GPT-4
+- Catastrophic forgetting not measured
+- Raw web data quality unknown
+## 📝 Citation
+```bibtex
+@misc{wirespeed2026,
+  title={Wire-Speed Transformer: Real-Time Learning from Live Network Streams},
+  author={OpenTransformers},
+  year={2026},
+  url={https://huggingface.co/OpenTransformer/wire-speed-transformer}
+}
+```
+## 🙏 Acknowledgments
+- DeepSeek for the tokenizer
+- Anthropic's Claude for pair programming
+- vast.ai for GPU compute
+## 📜 License
+MIT
+---
+*Built by OpenTransformers - Pushing the boundaries of what's possible with transformers.*