updated generation
Browse files
README.md
CHANGED
|
@@ -58,16 +58,6 @@ outputs = model.generate(inputs)
|
|
| 58 |
print(tokenizer.decode(outputs[0]))
|
| 59 |
```
|
| 60 |
|
| 61 |
-
### Fill-in-the-middle
|
| 62 |
-
Fill-in-the-middle uses special tokens to identify the prefix/middle/suffix part of the input and output:
|
| 63 |
-
|
| 64 |
-
```Java
|
| 65 |
-
input_text = "<fim_prefix>public class HelloWorld {\n public static void main(String[] args) {<fim_suffix>}\n}<fim_middle>"
|
| 66 |
-
inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
|
| 67 |
-
outputs = model.generate(inputs)
|
| 68 |
-
print(tokenizer.decode(outputs[0]))
|
| 69 |
-
```
|
| 70 |
-
|
| 71 |
### Attribution & Other Requirements
|
| 72 |
|
| 73 |
The pretraining dataset of the model was filtered for permissive licenses only. Nevertheless, the model can generate source code verbatim from the dataset. The code's license might require attribution and/or other specific requirements that must be respected. We provide a [search index](https://huggingface.co/spaces/bigcode/starcoder-search) that let's you search through the pretraining data to identify where generated code came from and apply the proper attribution to your code.
|
|
|
|
| 58 |
print(tokenizer.decode(outputs[0]))
|
| 59 |
```
|
| 60 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
### Attribution & Other Requirements
|
| 62 |
|
| 63 |
The pretraining dataset of the model was filtered for permissive licenses only. Nevertheless, the model can generate source code verbatim from the dataset. The code's license might require attribution and/or other specific requirements that must be respected. We provide a [search index](https://huggingface.co/spaces/bigcode/starcoder-search) that let's you search through the pretraining data to identify where generated code came from and apply the proper attribution to your code.
|