OpenTransformer commited on
Commit
4cfe745
Β·
verified Β·
1 Parent(s): 31554a2

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +136 -0
README.md ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Wire-Speed Transformer: Real-Time Learning from Live Network Streams
2
+
3
+ **A novel approach to transformer training that learns directly from network traffic in real-time.**
4
+
5
+ ## πŸ”₯ Key Results
6
+
7
+ | Time | Tokens | Loss | Notes |
8
+ |------|--------|------|-------|
9
+ | 0s | 0 | - | Start |
10
+ | 14s | 10k | 50.08 | Initial |
11
+ | 192s | 100k | 22.32 | -55% |
12
+ | 302s | 170k | 16.78 | -66% |
13
+ | 355s | 190k | 15.91 | **-68%** |
14
+
15
+ **Loss dropped from 50 β†’ 16 in under 6 minutes using only 32-token micro-batches from raw, uncurated web data.**
16
+
17
+ ## 🧠 What Makes This Different
18
+
19
+ Traditional transformer training requires:
20
+ - Large batch sizes (4096+)
21
+ - Multiple epochs over curated data
22
+ - Expensive preprocessing pipelines
23
+ - Hours/days of training
24
+
25
+ Wire-Speed Learning uses:
26
+ - **32-token micro-batches** (125x smaller)
27
+ - **Single pass** (no epochs)
28
+ - **Raw web data** (no curation)
29
+ - **Online SGD** (update every 32 tokens)
30
+ - **Real-time network stream** (Rust crawler β†’ Python trainer)
31
+
32
+ ## πŸ—οΈ Architecture
33
+
34
+ ```
35
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
36
+ β”‚ Rust Crawler │────▢│ Tokenizer │────▢│ Python Trainer β”‚
37
+ β”‚ (500 workers) β”‚ β”‚ (DeepSeek) β”‚ β”‚ (36M params) β”‚
38
+ β”‚ ~500 pages/s β”‚ β”‚ 128k vocab β”‚ β”‚ ~500 tok/s β”‚
39
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
40
+ β”‚ β”‚
41
+ β–Ό β–Ό
42
+ Live Internet Gradient Update
43
+ (no robots.txt) (every 32 tokens)
44
+ ```
45
+
46
+ ## πŸ“Š Model Config
47
+
48
+ ```python
49
+ CONFIG = {
50
+ "d": 256, # embedding dim
51
+ "layers": 4, # transformer layers
52
+ "heads": 8, # attention heads
53
+ "rank": 32, # tuneable attention rank
54
+ "vocab": 128256, # DeepSeek V3.2 tokenizer
55
+ "ctx": 512, # context window
56
+ }
57
+ # Total: 35,993,088 parameters (36M)
58
+ ```
59
+
60
+ ## πŸš€ Quick Start
61
+
62
+ ### Requirements
63
+ - CUDA GPU (8GB+ VRAM)
64
+ - Rust toolchain
65
+ - Python 3.8+
66
+ - PyTorch 2.0+
67
+
68
+ ### Installation
69
+
70
+ ```bash
71
+ # Clone
72
+ git clone https://huggingface.co/OpenTransformer/wire-speed-transformer
73
+ cd wire-speed-transformer
74
+
75
+ # Install Rust (if needed)
76
+ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
77
+ source ~/.cargo/env
78
+
79
+ # Build Rust crawler
80
+ cd feeder && cargo build --release && cd ..
81
+
82
+ # Download DeepSeek tokenizer
83
+ curl -sL https://huggingface.co/deepseek-ai/DeepSeek-V3.2/resolve/main/tokenizer.json -o tokenizer.json
84
+
85
+ # Install Python deps
86
+ pip install torch
87
+
88
+ # Run!
89
+ ./feeder/target/release/wire_feeder 2>feeder.log | python3 stream_trainer.py
90
+ ```
91
+
92
+ ## πŸ“ Files
93
+
94
+ - `stream_trainer.py` - Python transformer trainer (online learning)
95
+ - `feeder/` - Rust high-speed web crawler + tokenizer
96
+ - `tokenizer.json` - DeepSeek V3.2 tokenizer (download separately)
97
+ - `run.sh` - Launch script
98
+
99
+ ## πŸ”¬ Why This Works (Hypotheses)
100
+
101
+ 1. **Small models converge faster** - 36M params needs less data than 7B
102
+ 2. **High update frequency** - More gradient signal despite noise
103
+ 3. **Web has structure** - HTML patterns, common phrases provide learning signal
104
+ 4. **DeepSeek tokenizer** - High-quality tokenization from SOTA model
105
+
106
+ ## ⚠️ Limitations
107
+
108
+ - No evaluation yet (just training loss)
109
+ - Model is tiny (36M) - won't match GPT-4
110
+ - Catastrophic forgetting not measured
111
+ - Raw web data quality unknown
112
+
113
+ ## πŸ“ Citation
114
+
115
+ ```bibtex
116
+ @misc{wirespeed2026,
117
+ title={Wire-Speed Transformer: Real-Time Learning from Live Network Streams},
118
+ author={OpenTransformers},
119
+ year={2026},
120
+ url={https://huggingface.co/OpenTransformer/wire-speed-transformer}
121
+ }
122
+ ```
123
+
124
+ ## πŸ™ Acknowledgments
125
+
126
+ - DeepSeek for the tokenizer
127
+ - Anthropic's Claude for pair programming
128
+ - vast.ai for GPU compute
129
+
130
+ ## πŸ“œ License
131
+
132
+ MIT
133
+
134
+ ---
135
+
136
+ *Built by OpenTransformers - Pushing the boundaries of what's possible with transformers.*