Set max_position_embeddings: 40000 for engine-builder workaround

#2

Workaround for Baseten engine-builder bug where max_seq_len in truss config.yaml is overridden by max_position_embeddings from config.json, causing OOM at runtime for Llama-3-70B SeqCls FP8. Setting max_position_embeddings to a value ≤ desired max_seq_len (45000) makes the override benign. See Slack thread w/ Dhruv Singal 2026-04-28 (Slingshot debug).

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment