Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -23,21 +23,24 @@ pinned: false
|
|
| 23 |
</p>
|
| 24 |
</li>
|
| 25 |
<br>
|
|
|
|
| 26 |
<li>
|
| 27 |
<p>
|
| 28 |
<b>Spaces:</b> code generation with: <a ref="https://huggingface.co/codeparrot/codeparrot" class="underline">CodeParrot (1.5B)</a>, <a href="https://huggingface.co/facebook/incoder-6B" class="underline">InCoder</a> (6B) and <a href="https://github.com/salesforce/CodeGen" class="underline">CodeGen</a> (6B)
|
| 29 |
</p>
|
| 30 |
</li>
|
| 31 |
<br>
|
|
|
|
| 32 |
<li><b>Models:</b> CodeParrot (1.5B) and CodeParrot-small (110M), each repo has different ongoing experiments in the branches.</li>
|
| 33 |
<br>
|
|
|
|
| 34 |
<li><b>Datasets:</b><ul>
|
| 35 |
<li>1- <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean" class="underline">codeparrot-clean</a>, dataset on which we trained and evaluated CodeParrot, the splits are available under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean-train" class="underline">codeparrot-clean-train</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean-valid" class="underline">codeparrot-clean-valid</a>.</li>
|
| 36 |
|
| 37 |
<li>2- A more filtered version of codeparrot-clean under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-train-more-filtering" class="underline">codeparrot-train-more-filtering</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-valid-more-filtering" class="underline">codeparrot-train-more-filtering</a>.</li>
|
| 38 |
<li>3- CodeParrot dataset after near deduplication since initially only exact match deduplication was performed, it's available under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-train-near-deduplication" class="underline">codeparrot-train-near-deduplication</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-valid-near-deduplication" class="underline">codeparrot-train-near-deduplication</a>.</li>
|
| 39 |
|
| 40 |
-
<li>4- <a href="https://huggingface.co/datasets/codeparrot/github-code" class="underline">GitHub-Code</a>, a 1TB dataset of 32 programming languages
|
| 41 |
<li>5- <a href="https://huggingface.co/datasets/codeparrot/github-jupyter" class="underline">GitHub-Jupyter</a>, a 16.3GB dataset of Jupyter Notebooks from BigQuery GitHub.</li>
|
| 42 |
<li>6- <a href="https://huggingface.co/datasets/codeparrot/apps" class="underline">APPS</a>, a benchmark for code generation with 10000 problems.</li>
|
| 43 |
</ul>
|
|
|
|
| 23 |
</p>
|
| 24 |
</li>
|
| 25 |
<br>
|
| 26 |
+
|
| 27 |
<li>
|
| 28 |
<p>
|
| 29 |
<b>Spaces:</b> code generation with: <a ref="https://huggingface.co/codeparrot/codeparrot" class="underline">CodeParrot (1.5B)</a>, <a href="https://huggingface.co/facebook/incoder-6B" class="underline">InCoder</a> (6B) and <a href="https://github.com/salesforce/CodeGen" class="underline">CodeGen</a> (6B)
|
| 30 |
</p>
|
| 31 |
</li>
|
| 32 |
<br>
|
| 33 |
+
|
| 34 |
<li><b>Models:</b> CodeParrot (1.5B) and CodeParrot-small (110M), each repo has different ongoing experiments in the branches.</li>
|
| 35 |
<br>
|
| 36 |
+
|
| 37 |
<li><b>Datasets:</b><ul>
|
| 38 |
<li>1- <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean" class="underline">codeparrot-clean</a>, dataset on which we trained and evaluated CodeParrot, the splits are available under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean-train" class="underline">codeparrot-clean-train</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean-valid" class="underline">codeparrot-clean-valid</a>.</li>
|
| 39 |
|
| 40 |
<li>2- A more filtered version of codeparrot-clean under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-train-more-filtering" class="underline">codeparrot-train-more-filtering</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-valid-more-filtering" class="underline">codeparrot-train-more-filtering</a>.</li>
|
| 41 |
<li>3- CodeParrot dataset after near deduplication since initially only exact match deduplication was performed, it's available under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-train-near-deduplication" class="underline">codeparrot-train-near-deduplication</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-valid-near-deduplication" class="underline">codeparrot-train-near-deduplication</a>.</li>
|
| 42 |
|
| 43 |
+
<li>4- <a href="https://huggingface.co/datasets/codeparrot/github-code" class="underline">GitHub-Code</a>, a 1TB dataset of 32 programming languages from GitHub files.</li>
|
| 44 |
<li>5- <a href="https://huggingface.co/datasets/codeparrot/github-jupyter" class="underline">GitHub-Jupyter</a>, a 16.3GB dataset of Jupyter Notebooks from BigQuery GitHub.</li>
|
| 45 |
<li>6- <a href="https://huggingface.co/datasets/codeparrot/apps" class="underline">APPS</a>, a benchmark for code generation with 10000 problems.</li>
|
| 46 |
</ul>
|