Add DeepSeek-Lite Protocol: 50M params, FineWeb-Edu, TikToken, BFloat16 6ec2818 AdriBat1 commited on Jan 3
Add Deep-NanoGPT experiment (Phase 1 & 2): resumable training, inference, 72-layer models 671ce97 AdriBat1 commited on Jan 2