Upload README.md with huggingface_hub

4cfe745 verified 16 days ago

4.11 kB

	# Wire-Speed Transformer: Real-Time Learning from Live Network Streams

	A novel approach to transformer training that learns directly from network traffic in real-time.

	## 🔥 Key Results

	\| Time \| Tokens \| Loss \| Notes \|
	\|------\|--------\|------\|-------\|
	\| 0s \| 0 \| - \| Start \|
	\| 14s \| 10k \| 50.08 \| Initial \|
	\| 192s \| 100k \| 22.32 \| -55% \|
	\| 302s \| 170k \| 16.78 \| -66% \|
	\| 355s \| 190k \| 15.91 \| -68% \|

	Loss dropped from 50 → 16 in under 6 minutes using only 32-token micro-batches from raw, uncurated web data.

	## 🧠 What Makes This Different

	Traditional transformer training requires:
	- Large batch sizes (4096+)
	- Multiple epochs over curated data
	- Expensive preprocessing pipelines
	- Hours/days of training

	Wire-Speed Learning uses:
	- 32-token micro-batches (125x smaller)
	- Single pass (no epochs)
	- Raw web data (no curation)
	- Online SGD (update every 32 tokens)
	- Real-time network stream (Rust crawler → Python trainer)

	## 🏗️ Architecture

	```
	┌─────────────────┐ ┌──────────────┐ ┌─────────────────┐
	│ Rust Crawler │────▶│ Tokenizer │────▶│ Python Trainer │
	│ (500 workers) │ │ (DeepSeek) │ │ (36M params) │
	│ ~500 pages/s │ │ 128k vocab │ │ ~500 tok/s │
	└─────────────────┘ └──────────────┘ └─────────────────┘
	│ │
	▼ ▼
	Live Internet Gradient Update
	(no robots.txt) (every 32 tokens)
	```

	## 📊 Model Config

	```python
	CONFIG = {
	"d": 256, # embedding dim
	"layers": 4, # transformer layers
	"heads": 8, # attention heads
	"rank": 32, # tuneable attention rank
	"vocab": 128256, # DeepSeek V3.2 tokenizer
	"ctx": 512, # context window
	}
	# Total: 35,993,088 parameters (36M)
	```

	## 🚀 Quick Start

	### Requirements
	- CUDA GPU (8GB+ VRAM)
	- Rust toolchain
	- Python 3.8+
	- PyTorch 2.0+

	### Installation

	```bash
	# Clone
	git clone https://huggingface.co/OpenTransformer/wire-speed-transformer
	cd wire-speed-transformer

	# Install Rust (if needed)
	curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs \| sh -s -- -y
	source ~/.cargo/env

	# Build Rust crawler
	cd feeder && cargo build --release && cd ..

	# Download DeepSeek tokenizer
	curl -sL https://huggingface.co/deepseek-ai/DeepSeek-V3.2/resolve/main/tokenizer.json -o tokenizer.json

	# Install Python deps
	pip install torch

	# Run!
	./feeder/target/release/wire_feeder 2>feeder.log \| python3 stream_trainer.py
	```

	## 📁 Files

	- `stream_trainer.py` - Python transformer trainer (online learning)
	- `feeder/` - Rust high-speed web crawler + tokenizer
	- `tokenizer.json` - DeepSeek V3.2 tokenizer (download separately)
	- `run.sh` - Launch script

	## 🔬 Why This Works (Hypotheses)

	1. Small models converge faster - 36M params needs less data than 7B
	2. High update frequency - More gradient signal despite noise
	3. Web has structure - HTML patterns, common phrases provide learning signal
	4. DeepSeek tokenizer - High-quality tokenization from SOTA model

	## ⚠️ Limitations

	- No evaluation yet (just training loss)
	- Model is tiny (36M) - won't match GPT-4
	- Catastrophic forgetting not measured
	- Raw web data quality unknown

	## 📝 Citation

	```bibtex
	@misc{wirespeed2026,
	title={Wire-Speed Transformer: Real-Time Learning from Live Network Streams},
	author={OpenTransformers},
	year={2026},
	url={https://huggingface.co/OpenTransformer/wire-speed-transformer}
	}
	```

	## 🙏 Acknowledgments

	- DeepSeek for the tokenizer
	- Anthropic's Claude for pair programming
	- vast.ai for GPU compute

	## 📜 License

	MIT

	---

	Built by OpenTransformers - Pushing the boundaries of what's possible with transformers.