Add ood_generalization_test.py — tests model against realistic OOD scenarios to expose overfitting e93a3de verified KD099 commited on 8 days ago