| ### Micro Mistral | |
| This is a small mistral model with 6 layers | |
| It is similar to smol llama varaints uses GQA and tied embeddings. | |
| Except it uses mistral style arch with GQA and sliding window attention | |
| This architecture takes GQA and tied embeddings to create an effeceint 0.5B model that uses the mistral architecture(It is supported in downstream applications) | |
| #### Dataset | |
| Minipile | |
| Instruct | |
| Math | |
| OpenOrca | |
| Synthetic Data | |
| TODO: Complete Dataset section | |