Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
ronantakizawa 
posted an update 20 days ago
Post
2114
Introducing the github-top-code dataset: A curated dataset of 1.3M+ source code files from GitHub's top ranked developers.

I collected the best source code files from Github's highest trending developers of all time, and compiled a dataset to train LLMs to write well-structured, production-grade code.

#dataset #codedataset #pretraining

ronantakizawa/github-top-code
In this post