| | --- |
| | pipeline_tag: text-generation |
| | inference: true |
| | widget: |
| | - text: 'def print_hello_world():' |
| | example_title: Hello world |
| | group: Python |
| | license: bigcode-openrail-m |
| | datasets: |
| | - bigcode/commitpackft |
| | - bigcode/oasst-octopack |
| | metrics: |
| | - code_eval |
| | library_name: transformers |
| | tags: |
| | - code |
| | model-index: |
| | - name: OctoCoder |
| | results: |
| | - task: |
| | type: text-generation |
| | dataset: |
| | type: bigcode/humanevalpack |
| | name: HumanEvalSynthesize Python |
| | metrics: |
| | - name: pass@1 |
| | type: pass@1 |
| | value: 46.2 |
| | verified: false |
| | - task: |
| | type: text-generation |
| | dataset: |
| | type: bigcode/humanevalpack |
| | name: HumanEvalSynthesize JavaScript |
| | metrics: |
| | - name: pass@1 |
| | type: pass@1 |
| | value: 39.2 |
| | verified: false |
| | --- |
| | |
| |  |
| |
|
| | # OctoCoder |
| |
|
| | Play with the model on the [TODO Playground](https://huggingface.co/spaces/bigcode/bigcode-playground). |
| | <style> |
| | table{ |
| | border-collapse: collapse; |
| | } |
| | </style> |
| | <table> |
| | <tr> |
| | <th>Model (↓)</th> |
| | <th>Python</th> |
| | <th>JavaScript</th> |
| | <th>Java</th> |
| | <th>Go</th> |
| | <th>C++</th> |
| | <th>Rust</th> |
| | <th>Avg.</th> |
| | </tr> |
| | </table> |
| | <hr style="background-color: black;"> |
| | <center><strong>HumanEvalFix</strong></center> |
| | <hr style="background-color: black;"> |
| | <center>Non-permissive models</center> |
| | <hr style="background-color: black;"> |
| | <table> |
| | <tr> |
| | <td>WizardCoder</td> |
| | <td>31.8</td> |
| | <td>29.5</td> |
| | <td>12.7</td> |
| | <td>30.4</td> |
| | <td>18.7</td> |
| | <td>13.0</td> |
| | <td>22.7</td> |
| | </tr> |
| | <tr> |
| | <td>GPT-4</td> |
| | <td>47.0 </td> |
| | <td>48.2</td> |
| | <td>50.0</td> |
| | <td>50.6</td> |
| | <td>47.6</td> |
| | <td>43.3</td> |
| | <td><u>47.8</u></td> |
| | </tr> |
| | </table> |
| | <hr style="background-color: black;"> |
| | <center>Permissive models</center> |
| | <hr style="background-color: black;"> |
| | <table> |
| | <tr> |
| | <td>InstructCodeT5+<sup>‡</sup></td> |
| | <td>2.7</td> |
| | <td>1.2</td> |
| | <td>4.3</td> |
| | <td>2.1</td> |
| | <td>0.2</td> |
| | <td>0.5</td> |
| | <td>1.8</td> |
| | </tr> |
| | <tr> |
| | <td>BLOOMZ<sup>+</sup></td> |
| | <td>16.6</td> |
| | <td>15.5</td> |
| | <td>15.2</td> |
| | <td>16.4</td> |
| | <td>6.7</td> |
| | <td>5.7</td> |
| | <td>12.5</td> |
| | </tr> |
| | <tr> |
| | <td>StarChat-β</td> |
| | <td>18.1</td> |
| | <td>18.1</td> |
| | <td>24.1</td> |
| | <td>18.1</td> |
| | <td>8.2</td> |
| | <td>3.6</td> |
| | <td>11.2</td> |
| | </tr> |
| | <tr> |
| | <td>CodeGeeX2<sup>*</sup></td> |
| | <td>15.9</td> |
| | <td>14.7</td> |
| | <td>18.0</td> |
| | <td>13.6</td> |
| | <td>4.3</td> |
| | <td>6.1</td> |
| | <td>12.1</td> |
| | </tr> |
| | <tr> |
| | <td>StarCoder</td> |
| | <td>8.7</td> |
| | <td>15.7</td> |
| | <td>13.3</td> |
| | <td>20.1</td> |
| | <td>15.6</td> |
| | <td>6.7</td> |
| | <td>13.4</td> |
| | </tr> |
| | <tr> |
| | <td>OctoGeeX<sup>*</sup></td> |
| | <td>28.1</td> |
| | <td>27.7</td> |
| | <td>30.4</td> |
| | <td>27.6</td> |
| | <td>22.9</td> |
| | <td>9.6</td> |
| | <td>24.4</td> |
| | </tr> |
| | <tr> |
| | <td>OctoCoder</td> |
| | <td><strong>30.2</strong></td> |
| | <td><strong>28.4</strong></td> |
| | <td><strong>30.6</strong></td> |
| | <td><strong>30.2</strong></td> |
| | <td><strong>26.1</strong></td> |
| | <td><strong>16.5</strong></td> |
| | <td><strong>27.0</strong></td> |
| | </tr> |
| | </table> |
| | <hr style="background-color: black;"> |
| | <center><h4>HumanEvalExplain</h4></center> |
| | <hr style="background-color: black;"> |
| | <center>Non-permissive models</center> |
| | <hr style="background-color: black;"> |
| | <table> |
| | <tr> |
| | <td>WizardCoder</td> |
| | <td>32.5</td> |
| | <td>33.0</td> |
| | <td>27.4</td> |
| | <td>26.7</td> |
| | <td>28.2</td> |
| | <td>16.9</td> |
| | <td>27.5</td> |
| | </tr> |
| | <tr> |
| | <td>GPT-4</td> |
| | <td>64.6</td> |
| | <td>57.3</td> |
| | <td>51.2</td> |
| | <td>58.5</td> |
| | <td>38.4</td> |
| | <td>42.7</td> |
| | <td><u>52.1</u></td> |
| | </tr> |
| | </table> |
| | <hr style="background-color: black;"> |
| | <center>Permissive models</center> |
| | <hr style="background-color: black;"> |
| | <table> |
| | <tr> |
| | <td>InstructCodeT5+<sup>‡</sup></td> |
| | <td>20.8</td> |
| | <td>0.0</td> |
| | <td>0.0</td> |
| | <td>0.0</td> |
| | <td>0.1</td> |
| | <td>0.0</td> |
| | <td>3.5</td> |
| | </tr> |
| | <tr> |
| | <td>BLOOMZ<sup>+</sup></td> |
| | <td>14.7</td> |
| | <td>8.8</td> |
| | <td>12.1</td> |
| | <td>8.5</td> |
| | <td>0.6</td> |
| | <td>0.0</td> |
| | <td>7.5</td> |
| | </tr> |
| | <tr> |
| | <td>StarChat-β</td> |
| | <td>25.4</td> |
| | <td>21.5</td> |
| | <td>24.5</td> |
| | <td>18.4</td> |
| | <td>17.6</td> |
| | <td>13.2</td> |
| | <td>20.1</td> |
| | </tr> |
| | <tr> |
| | <td>CodeGeeX2<sup>*</sup></td> |
| | <td>0.0</td> |
| | <td>0.0</td> |
| | <td>0.0</td> |
| | <td>0.0</td> |
| | <td>0.0</td> |
| | <td>0.0</td> |
| | <td>0.0</td> |
| | </tr> |
| | <tr> |
| | <td>StarCoder</td> |
| | <td>0.0</td> |
| | <td>0.0</td> |
| | <td>0.0</td> |
| | <td>0.0</td> |
| | <td>0.0</td> |
| | <td>0.0</td> |
| | <td>0.0</td> |
| | </tr> |
| | <tr> |
| | <td>OctoGeeX<sup>*</sup></td> |
| | <td>30.4</td> |
| | <td>24.0</td> |
| | <td>24.7</td> |
| | <td><strong>21.7</strong></td> |
| | <td>21.0</td> |
| | <td><strong>15.9</strong></td> |
| | <td>22.9</td> |
| | </tr> |
| | <tr> |
| | <td>OctoCoder</td> |
| | <td><strong>35.1</strong></td> |
| | <td><strong>24.5</strong></td> |
| | <td><strong>27.3</strong></td> |
| | <td>21.1</td> |
| | <td><strong>24.1</strong></td> |
| | <td>14.8</td> |
| | <td><strong>24.5</strong></td> |
| | </tr> |
| | </table> |
| | <hr style="background-color: black;"> |
| | <center><h4>HumanEvalSynthesize</h4></center> |
| | <hr style="background-color: black;"> |
| | <center>Non-permissive models</center> |
| | <hr style="background-color: black;"> |
| | <table> |
| | <tr> |
| | <td>WizardCoder</td> |
| | <td>57.3</td> |
| | <td>49.5</td> |
| | <td>36.1</td> |
| | <td>36.4</td> |
| | <td>40.9</td> |
| | <td>20.2</td> |
| | <td>40.1</td> |
| | </tr> |
| | <tr> |
| | <td>GPT-4</td> |
| | <td>86.6</td> |
| | <td>82.9</td> |
| | <td>81.7</td> |
| | <td>72.6</td> |
| | <td>78.7</td> |
| | <td>67.1</td> |
| | <td><u>78.3</u></td> |
| | </tr> |
| | </table> |
| | <hr style="background-color: black;"> |
| | <center>Permissive models</center> |
| | <hr style="background-color: black;"> |
| | <table> |
| | <tr> |
| | <td>InstructCodeT5+<sup>‡</sup></td> |
| | <td>37.0</td> |
| | <td>18.9</td> |
| | <td>17.4</td> |
| | <td>9.5</td> |
| | <td>19.8</td> |
| | <td>0.3</td> |
| | <td>17.1</td> |
| | </tr> |
| | <tr> |
| | <td>BLOOMZ<sup>+</sup></td> |
| | <td>15.6</td> |
| | <td>14.8</td> |
| | <td>18.4</td> |
| | <td>8.4</td> |
| | <td>6.5</td> |
| | <td>5.5</td> |
| | <td>11.5</td> |
| | </tr> |
| | <tr> |
| | <td>StarChat-β</td> |
| | <td>33.5</td> |
| | <td>31.4</td> |
| | <td>26.7</td> |
| | <td>25.5</td> |
| | <td>26.6</td> |
| | <td>14.0</td> |
| | <td>26.3</td> |
| | </tr> |
| | <tr> |
| | <td>CodeGeeX2<sup>*</sup></td> |
| | <td>35.9</td> |
| | <td>32.2</td> |
| | <td>30.8</td> |
| | <td>22.5</td> |
| | <td>29.3</td> |
| | <td>18.1</td> |
| | <td>28.1</td> |
| | </tr> |
| | <tr> |
| | <td>StarCoder</td> |
| | <td>33.6</td> |
| | <td>30.8</td> |
| | <td>30.2</td> |
| | <td>17.6</td> |
| | <td>31.6</td> |
| | <td>21.8</td> |
| | <td>27.6</td> |
| | </tr> |
| | <tr> |
| | <td>OctoGeeX<sup>*</sup></td> |
| | <td>44.7</td> |
| | <td>33.8</td> |
| | <td>36.9</td> |
| | <td>21.9</td> |
| | <td>32.3</td> |
| | <td>15.7</td> |
| | <td>30.9</td> |
| | </tr> |
| | <tr> |
| | <td>OctoCoder</td> |
| | <td><strong>46.2</strong></td> |
| | <td><strong>39.2</strong></td> |
| | <td><strong>38.2</strong></td> |
| | <td><strong>30.4</strong></td> |
| | <td><strong>35.6</strong></td> |
| | <td><strong>23.4</strong></td> |
| | <td><strong>35.5</strong></td> |
| | </tr> |
| | </table> |
| | |
| | ## Table of Contents |
| |
|
| | 1. [Model Summary](##model-summary) |
| | 2. [Use](##use) |
| | 3. [Limitations](##limitations) |
| | 4. [Training](##training) |
| | 5. [License](##license) |
| | 6. [Citation](##citation) |
| |
|
| | ## Model Summary |
| |
|
| | OctoCoder is an instruction tuned model with 15.5B parameters created by finetuning StarCoder on CommitPackFT & OASST as described in the OctoPack paper. |
| |
|
| | - **Repository:** [bigcode/octopack](https://github.com/bigcode-project/octopack) |
| | - **Paper:** [TODO]() |
| | - **Languages:** 80+ Programming languages |
| |
|
| | ## Use |
| |
|
| | ### Intended use |
| |
|
| | The model follows instructions provided in the input. We recommend prefacing your input with "Question: " and finishing with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:" |
| |
|
| | **Feel free to share your generations in the Community tab!** |
| |
|
| | ### Generation |
| | ```python |
| | # pip install -q transformers |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | |
| | checkpoint = "bigcode/octocoder" |
| | device = "cuda" # for GPU usage or "cpu" for CPU usage |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(checkpoint) |
| | model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device) |
| | |
| | inputs = tokenizer.encode("Question: Please write a function in Python that performs bubble sort.\n\nAnswer:", return_tensors="pt").to(device) |
| | outputs = model.generate(inputs) |
| | print(tokenizer.decode(outputs[0])) |
| | ``` |
| |
|
| | # Training |
| |
|
| | ## Model |
| |
|
| | - **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective |
| | - **Steps:** 250k pretraining & 30 instruction tuning |
| | - **Pretraining tokens:** 1 trillion pretraining & 2M instruction tuning |
| | - **Precision:** bfloat16 |
| |
|
| | ## Hardware |
| |
|
| | - **Pretraining:** |
| | - **GPUs:** 512 Tesla A100 |
| | - **Training time:** 24 days |
| | - **Instruction tuning:** |
| | - **GPUs:** 8 Tesla A100 |
| | - **Training time:** 4 hours |
| |
|
| | ## Software |
| |
|
| | - **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training) |
| | - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch) |
| |
|
| | # Citation |
| |
|
| | TODO |