Commit History
fixing prompt template of chatml by removal of linebreak (#922) 03c6318 unverified
support for mamba (#915) 40a6362 unverified
chore: clarify Readme on sharegpt system role d339beb unverified
fix(tokenizer): handle fast tokenizer properly for bos/eos (#914) fde091c unverified
Pin flash-attn to 2.3.3 (#919) 06ae392 unverified
Support device_map=sequential & max_memory config parameters (#903) 992e742 unverified
Feat(wandb): Refactor to be more flexible (#767) a1da39c unverified
feature: loss watchdog for terminating training runs that are failing (#899) 58ec8b1 unverified
Remove learning rate scheduler in deepspeed config to avoid conflict (#909) 476a205 unverified
fix for qwen w lora (#906) 3e3229e unverified
ensure merged model matches the training dtype (#902) 1d21aa6 unverified
Determine FSDP/deepspeed settings on device select. (#883) 71b7ea3 unverified
fix: remove FA for qwen examples (#900) a48dbf6 unverified
update datasets version to cut down the warnings due to pyarrow arg change (#897) 6a4562a unverified
Feat: Add Qwen (#894) 1115c50 unverified
fix: warning should not show if eval_batch_size not provided (#896) 7ee3c4c unverified
Feat: Add warmup_ratio (#893) fb12895 unverified
chore(doc): Add info on changing role in sharegpt (#886) 9fc29e0 unverified
fix: revert local dir dataset load (#878) 575a082 unverified
Install from git url (#874) ddf8150 unverified
Phi update 202311 (#876) 9bf854e unverified
don't train if eval split is too small (#873) 797f3dd unverified
try #2: pin hf transformers and accelerate to latest release, don't reinstall pytorch (#867) 0de1457 unverified
Feat: Add dataset loading from S3, GCS (#765) 3cc67d2 unverified
allow overriding of model_config parameters from the YML (#853) 1bc1186 unverified
add e2e tests for checking functionality of resume from checkpoint (#865) b3a61e8 unverified
make docker command more robust (#861) 8a8d1c4 unverified
lint fix that didn't get caught by linter (#866) 332984d unverified
Docs: add instructions to 1-click launching on public clouds (#862) b33c1d5 unverified
multipack len should use max, not min (#863) 0c2a630 unverified
adds llama and mistral dropout support (#858) db8a8af unverified
various bugfixes (#856) 1470650 unverified
chore(doc): Separate section on runpod (#860) 501b4d1 unverified
feat(doc): add more info on train_on_split (#855) 306fe19 unverified
include the suffix modified string in ascii art (#852) 614cff4 unverified
cleanup the old multipack dataloader (#841) 1a6309c unverified
Pin optimum package (#838) 105d0b3 unverified
Bryan Thornbury commited on