microsoft
/

Phi-3-small-8k-instruct

Text Generation

Model card Files Files and versions

nguyenbh commited on May 29, 2024

Commit

17d6603

·

verified ·

1 Parent(s): 1adb635

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -168,7 +168,7 @@ Developers should apply responsible AI best practices and are responsible for en
 ### Model
-* Architecture: Phi-3 Small-8K-Instruct has 7B parameters and is a dense decoder-only Transformer model. The model is fine-tuned with Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to ensure alignment with human preferences and safety guidlines.
 * Inputs: Text. It is best suited for prompts using chat format.
 * Context length: 8K tokens
 * GPUs: 1024 H100-80G
@@ -247,7 +247,7 @@ We take a closer look at different categories across 80 public benchmark dataset
 * [Triton](https://github.com/openai/triton)
 ## Hardware
-Note that by default, the Phi-3-Small model uses flash attention, which requires certain types of GPU hardware to run. We have tested on the following GPU types:
 * NVIDIA A100
 * NVIDIA A6000
 * NVIDIA H100

 ### Model
+* Architecture: Phi-3 Small-8K-Instruct has 7B parameters and is a dense decoder-only Transformer model with alternating dense and blocksparse attentions. The model is fine-tuned with Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to ensure alignment with human preferences and safety guidlines.
 * Inputs: Text. It is best suited for prompts using chat format.
 * Context length: 8K tokens
 * GPUs: 1024 H100-80G
 * [Triton](https://github.com/openai/triton)
 ## Hardware
+Note that by default, the Phi-3-Small model uses flash attention 2 and Triton blocksparse attention, which requires certain types of GPU hardware to run. We have tested on the following GPU types:
 * NVIDIA A100
 * NVIDIA A6000
 * NVIDIA H100