docs: add data pipeline, tokenization, architecture, pre-training, fine-tuning docs 7f4eac4 verified vthawfeek commited on Jun 4