matlok 's Collections

Papers - Pre-training - Dynamic Context Length

For HyperClova X they split 90% at 4096 and 10% at 32k context length during pt