Commit History
Adds support for flash-attn rotary embedding and fused dense layers. 0bbd68a
Adds support for MQA/GQA and attention mask during training. de35f90
Update modeling_mixformer_sequential.py d38e6f9
Upload modeling_mixformer_sequential.py b6a7e2f
Upload README.md bc09a08
fix(phi-1_5): Checks length of `attention_mask`if it is passed as direct tensor. f9f2ac7
Support for `attention_mask` in forward pass. 3128bb6
Update README.md 7d482dd
Update README.md c8f6ad8
Link paper to arXiv (#5) 762a311
Update README.md ea95720
Update README.md 4bba51c
Update README.md 52e294a
Upload tokenizer 9efbcaf
Upload MixFormerSequentialForCausalLM d655135
Update README.md 07a048e
Update README.md b630515
Update README.md 40b496f
Update README.md d9c7521
Update README.md 6ddac37
Update README.md cd4510c
Update README.md 34046b0
Update README.md 24ad69c
Update README.md b3d67f3
Upload Research License.docx 14be656
Upload tokenizer 6157c47
Upload MixFormerSequentialForCausalLM e656142
Upload tokenizer 4b752e7
Upload MixFormerSequentialForCausalLM 2bfd6ef
Upload tokenizer 67f350b
Upload MixFormerSequentialForCausalLM ba44a90
Upload tokenizer 67a43eb
Upload MixFormerSequentialForCausalLM 1698206
initial commit 98416e6
Gunasekar commited on