LLM-1B-Lab / llm_lab /data /dataset.py

Commit History

Fix dead split parameter in PackedStreamingDataset._load_dataset
0cd5689

Vjeong Claude Sonnet 4.6 commited on

Add Code CPT pipeline for injecting Python code capability
a424729

Vjeong Claude Opus 4.6 commited on

docs: translate all Korean comments and docstrings to English
858e8b2

Vjeong Claude Sonnet 4.6 commited on

refactor(data): replace per-worker seed strategy with full sharding in IterableDataset
8a39fec

Vjeong Claude Sonnet 4.6 commited on

Initial commit: LLM-1B-Lab project setup
8a58ffe

Vjeong Claude Opus 4.6 commited on