| license: apache-2.0 | |
| mistralized tinyllama since flash attention training on llama w/ flash-attn is buggy. | |
| it's based on the 3t base model (not chat tuned). | |
| not extensively tested. | |
| enjoy! | |
| (model card is repeated due to open llm leaderboard length requirements) | |
| mistralized tinyllama since flash attention training on llama w/ flash-attn is buggy. | |
| it's based on the 3t base model (not chat tuned). | |
| not extensively tested. | |
| enjoy! | |
| mistralized tinyllama since flash attention training on llama w/ flash-attn is buggy. | |
| it's based on the 3t base model (not chat tuned). | |
| not extensively tested. | |
| enjoy! | |
| mistralized tinyllama since flash attention training on llama w/ flash-attn is buggy. | |
| it's based on the 3t base model (not chat tuned). | |
| not extensively tested. | |
| enjoy! | |
| mistralized tinyllama since flash attention training on llama w/ flash-attn is buggy. | |
| it's based on the 3t base model (not chat tuned). | |
| not extensively tested. | |
| enjoy! | |
| mistralized tinyllama since flash attention training on llama w/ flash-attn is buggy. | |
| it's based on the 3t base model (not chat tuned). | |
| not extensively tested. | |
| enjoy! | |
| mistralized tinyllama since flash attention training on llama w/ flash-attn is buggy. | |
| it's based on the 3t base model (not chat tuned). | |
| not extensively tested. | |
| enjoy! | |
| mistralized tinyllama since flash attention training on llama w/ flash-attn is buggy. | |
| it's based on the 3t base model (not chat tuned). | |
| not extensively tested. | |
| enjoy! | |
| mistralized tinyllama since flash attention training on llama w/ flash-attn is buggy. | |
| it's based on the 3t base model (not chat tuned). | |
| not extensively tested. | |
| enjoy! |