th135/llama-1B-20BT-weightdecay0.5-seed42-simplescaling-sweep-lr3.0e-5-bs32-wdft1.0 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.5-seed42-simplescaling-sweep-lr3.0e-5-bs32-wdft0.1 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.5-seed42-simplescaling-sweep-lr3.0e-5-bs32-wdft0.0 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.5-seed42-simplescaling-sweep-lr3.0e-5-bs16-wdft1.0 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.5-seed42-simplescaling-sweep-lr3.0e-5-bs16-wdft0.1 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.5-seed42-simplescaling-sweep-lr3.0e-5-bs16-wdft0.0 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.5-seed42-simplescaling-sweep-lr3.0e-5-bs8-wdft1.0 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.5-seed42-simplescaling-sweep-lr3.0e-5-bs8-wdft0.1 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.5-seed42-simplescaling-sweep-lr3.0e-5-bs8-wdft0.0 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.5-seed42-simplescaling-sweep-lr1.0e-5-bs32-wdft1.0 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.5-seed42-simplescaling-sweep-lr1.0e-5-bs32-wdft0.1 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.5-seed42-simplescaling-sweep-lr1.0e-5-bs32-wdft0.0 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.5-seed42-simplescaling-sweep-lr1.0e-5-bs16-wdft1.0 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.5-seed42-simplescaling-sweep-lr1.0e-5-bs16-wdft0.1 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.5-seed42-simplescaling-sweep-lr1.0e-5-bs8-wdft1.0 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.5-seed42-simplescaling-sweep-lr1.0e-5-bs8-wdft0.1 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.5-seed42-simplescaling-sweep-lr1.0e-5-bs8-wdft0.0 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.1-seed42-simplescaling-sweep-lr6.0e-4-bs32-wdft1.0 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.1-seed42-simplescaling-sweep-lr6.0e-4-bs32-wdft0.1 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.1-seed42-simplescaling-sweep-lr6.0e-4-bs32-wdft0.0 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.1-seed42-simplescaling-sweep-lr6.0e-4-bs16-wdft1.0 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.1-seed42-simplescaling-sweep-lr6.0e-4-bs16-wdft0.1 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.1-seed42-simplescaling-sweep-lr6.0e-4-bs16-wdft0.0 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.1-seed42-simplescaling-sweep-lr6.0e-4-bs8-wdft1.0 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.1-seed42-simplescaling-sweep-lr6.0e-4-bs8-wdft0.1 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.1-seed42-simplescaling-sweep-lr6.0e-4-bs8-wdft0.0 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.1-seed42-simplescaling-sweep-lr3.0e-5-bs32-wdft1.0 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.1-seed42-simplescaling-sweep-lr3.0e-5-bs32-wdft0.1 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.1-seed42-simplescaling-sweep-lr3.0e-5-bs32-wdft0.0 1B • Updated Feb 12 • 1
th135/llama-1B-20BT-weightdecay0.1-seed42-simplescaling-sweep-lr3.0e-5-bs16-wdft1.0 1B • Updated Feb 12 • 1