microsoft
/

phi-1

@@ -11,7 +11,7 @@ tags:
 ---
 ## Model Summary
-The language model Phi-1 is a Transformer with 1.3 billion parameters, specialized for basic Python coding. Its training involved a variety of data sources, including subsets of Python codes from [The Stack v1.2](https://huggingface.co/datasets/bigcode/the-stack), Q&A content from [StackOverflow](https://archive.org/download/stackexchange), competition code from [code_contests](https://github.com/deepmind/code_contests), and synthetic Python textbooks and exercises generated by [gpt-3.5-turbo-0301](https://platform.openai.com/docs/models/gpt-3-5). Even though the model and the datasets are relatively small compared to contemporary Large Language Models (LLMs), phi-1 has demonstrated an impressive accuracy rate exceeding 50% on the simple Python coding benchmark, HumanEval.
 ## Intended Uses
 Given the nature of the training data, Phi-1 is best suited for prompts using the code format:
@@ -37,6 +37,7 @@ where the model generates the code after the comments. (Note: This is a legitima
 * If you are using `transformers>=4.36.0`, always load the model with `trust_remote_code=True` to prevent side-effects.
 ## Sample Code
 ```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -56,10 +57,9 @@ text = tokenizer.batch_decode(outputs)[0]
 print(text)
 ```
-**Remark.** In the generation function, our model currently does not support beam search (`num_beams` >1).
 Furthermore, in the forward pass of the model, we currently do not support outputting hidden states or attention values, or using custom input embeddings.
 ## Limitations of Phi-1
 * Limited Scope: 99.8% of the Python scripts in our fine-tuning dataset use only the packages "typing, math, random, collections, datetime, itertools". If the model generates Python scripts that utilize other packages, we strongly recommend users manually verify all API uses.
@@ -93,7 +93,7 @@ Given these potential pitfalls, and others not explicitly mentioned, it's essent
 ### Software
 * [PyTorch](https://github.com/pytorch/pytorch)
 * [DeepSpeed](https://github.com/microsoft/DeepSpeed)
-* [flash-attention](https://github.com/HazyResearch/flash-attention)
 ### License
 The model is licensed under the [Research License](https://huggingface.co/microsoft/phi-1/resolve/main/Research%20License.docx).

 ---
 ## Model Summary
+The language model Phi-1 is a Transformer with 1.3 billion parameters, specialized for basic Python coding. Its training involved a variety of data sources, including subsets of Python codes from [The Stack v1.2](https://huggingface.co/datasets/bigcode/the-stack), Q&A content from [StackOverflow](https://archive.org/download/stackexchange), competition code from [code_contests](https://github.com/deepmind/code_contests), and synthetic Python textbooks and exercises generated by [gpt-3.5-turbo-0301](https://platform.openai.com/docs/models/gpt-3-5). Even though the model and the datasets are relatively small compared to contemporary Large Language Models (LLMs), Phi-1 has demonstrated an impressive accuracy rate exceeding 50% on the simple Python coding benchmark, HumanEval.
 ## Intended Uses
 Given the nature of the training data, Phi-1 is best suited for prompts using the code format:
 * If you are using `transformers>=4.36.0`, always load the model with `trust_remote_code=True` to prevent side-effects.
 ## Sample Code
 ```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
 print(text)
 ```
+**Remark.** In the generation function, our model currently does not support beam search (`num_beams > 1`).
 Furthermore, in the forward pass of the model, we currently do not support outputting hidden states or attention values, or using custom input embeddings.
 ## Limitations of Phi-1
 * Limited Scope: 99.8% of the Python scripts in our fine-tuning dataset use only the packages "typing, math, random, collections, datetime, itertools". If the model generates Python scripts that utilize other packages, we strongly recommend users manually verify all API uses.
 ### Software
 * [PyTorch](https://github.com/pytorch/pytorch)
 * [DeepSpeed](https://github.com/microsoft/DeepSpeed)
+* [Flash-Attention](https://github.com/HazyResearch/flash-attention)
 ### License
 The model is licensed under the [Research License](https://huggingface.co/microsoft/phi-1/resolve/main/Research%20License.docx).