Update README.md
Browse files
README.md
CHANGED
|
@@ -168,7 +168,7 @@ Developers should apply responsible AI best practices and are responsible for en
|
|
| 168 |
|
| 169 |
### Model
|
| 170 |
|
| 171 |
-
* Architecture: Phi-3 Small-8K-Instruct has 7B parameters and is a dense decoder-only Transformer model. The model is fine-tuned with Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to ensure alignment with human preferences and safety guidlines.
|
| 172 |
* Inputs: Text. It is best suited for prompts using chat format.
|
| 173 |
* Context length: 8K tokens
|
| 174 |
* GPUs: 1024 H100-80G
|
|
@@ -247,7 +247,7 @@ We take a closer look at different categories across 80 public benchmark dataset
|
|
| 247 |
* [Triton](https://github.com/openai/triton)
|
| 248 |
|
| 249 |
## Hardware
|
| 250 |
-
Note that by default, the Phi-3-Small model uses flash attention, which requires certain types of GPU hardware to run. We have tested on the following GPU types:
|
| 251 |
* NVIDIA A100
|
| 252 |
* NVIDIA A6000
|
| 253 |
* NVIDIA H100
|
|
|
|
| 168 |
|
| 169 |
### Model
|
| 170 |
|
| 171 |
+
* Architecture: Phi-3 Small-8K-Instruct has 7B parameters and is a dense decoder-only Transformer model with alternating dense and blocksparse attentions. The model is fine-tuned with Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to ensure alignment with human preferences and safety guidlines.
|
| 172 |
* Inputs: Text. It is best suited for prompts using chat format.
|
| 173 |
* Context length: 8K tokens
|
| 174 |
* GPUs: 1024 H100-80G
|
|
|
|
| 247 |
* [Triton](https://github.com/openai/triton)
|
| 248 |
|
| 249 |
## Hardware
|
| 250 |
+
Note that by default, the Phi-3-Small model uses flash attention 2 and Triton blocksparse attention, which requires certain types of GPU hardware to run. We have tested on the following GPU types:
|
| 251 |
* NVIDIA A100
|
| 252 |
* NVIDIA A6000
|
| 253 |
* NVIDIA H100
|