Spaces:

codeparrot
/

code-generation-models

Running

loubnabnl HF Staff commited on Jun 3, 2022

Commit

66aea4c

1 Parent(s): bb51c11

Update datasets/github_code.md

Files changed (1) hide show

datasets/github_code.md CHANGED Viewed

@@ -19,6 +19,10 @@ print(next(iter(ds)))
 ```
 You can see that in addition to the code, the samples include some metadata: repo name, path, language, license, and the size of the file. Below is the distribution of programming languages in this dataset.
 Below is the distribution of the pretraining data size of some code models:
 <p align="center">
     <img src="https://huggingface.co/datasets/loubnabnl/repo-images/resolve/main/data_distrub.png" alt="drawing" width="450"/>

 ```
 You can see that in addition to the code, the samples include some metadata: repo name, path, language, license, and the size of the file. Below is the distribution of programming languages in this dataset.
+<p align="center">
+    <img src="https://huggingface.co/datasets/lvwerra/github-code/resolve/main/github-code-stats-alpha.png" alt="drawing" width="450"/>
+</p>
 Below is the distribution of the pretraining data size of some code models:
 <p align="center">
     <img src="https://huggingface.co/datasets/loubnabnl/repo-images/resolve/main/data_distrub.png" alt="drawing" width="450"/>