Commit History
Add support for Gemma chat template (#1530) 60f5ce0 unverified
Unsloth gradient checkpointing offload (#1528) 6319da1 unverified
qwen2_moe support w multipack (#1455) 6086be8 unverified
fix some of the edge cases for Jamba (#1452) 05b398a unverified
beta support for multipack with gemmoe: (#1402) 8df7b88 unverified
Update fastchat_conversation_turns.py (#1294) [skip ci] 2b9687f unverified
fix steps check for anneal on first cycle (#1316) 2c9c88b unverified
make mlflow optional (#1317) 5894f0e unverified
multipack for gemma (#1313) 2752d5f unverified
allow the optimizer prune ratio for ReLoRA to be configurable (#1287) 4b997c3 unverified
Add MPS support (#1264) fac2d98 unverified
simplify haldning for newer multipack patches so they can be added in a single place (#1270) 5698943 unverified
relora: magnitude pruning of the optimizer (#1245) 8c2e05a unverified
support for true batches with multipack (#1230) 00568c1 unverified
Respect sliding_window=None (#1214) 62ca4a2 unverified
DreamGenX commited on
Mixtral fixes 20240124 (#1192) [skip ci] 54d2ac1 unverified
Phi2 multipack (#1173) 814aee6 unverified
Falcon embeddings (#1149) [skip docker] e799e08 unverified
Qwen2 (#1166) f5a828a unverified
Multipack simplify for Mixtral (#1142) 6910e6a unverified
optimize calculation of cu_seqlens from position_ids (#1084) [skip ci] 90036eb unverified
Added chatglm3 conversation type for training models like TinyLLama (#1036) 59b2d30 unverified
bump transformers and update attention class map name (#1023) bcc78d8 unverified
remove landmark attn and xpos rope implementations (#1010) 70b46ca unverified
fix mistral prompt assembly (#982) 7bbaac9 unverified
Fix prompt assembly for llama (#952) 5ada140 unverified
fix: switch to using the HuggingFace Transformers NEFT implementation (#941) ef24342 unverified
kallewoof commited on
Mixtral official (#942) 7fabc4d unverified
adds llama and mistral dropout support (#858) db8a8af unverified
various bugfixes (#856) 1470650 unverified
refactor neft patch to be more re-usable similar to trl's impl (#796) 827ec3d unverified
Hotfix for not saving correctly (#762) 32eeeb5 unverified
Implement fused modules (#747) 15d3a65 unverified
Mistral: Sliding Window Attention with Flash Attention and Sample Packing (#732) a045db0 unverified
add noisy embedding (#721) 3bd9528 unverified
Maxime Maxime commited on