2026-01-22 06:56:15,580 - __main__ - INFO - Loading model: Kiria-Nozan/Intern-s1-mini-distill-dsv32-11k-samples 2026-01-22 06:56:15,580 - __main__ - INFO - Output directory: /vast/home/j/jojolee/therapeutic-tuning/results/sft/rejection_sampling_pgb_clin_herg/sft_rejection_sampling_pgb_clin_herg_Intern-s1-mini-distill-dsv32-11k-samples_lr1e-05/2026-01-22_06-56 2026-01-22 06:56:15,580 - __main__ - INFO - Datasets: ['rejection_sampling_pgb_clin_herg'] 2026-01-22 06:56:16,052 - __main__ - INFO - Loading model 'Kiria-Nozan/Intern-s1-mini-distill-dsv32-11k-samples' with attn_implementation='flash_attention_2' 2026-01-22 06:56:16,795 - accelerate.utils.modeling - INFO - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk). 2026-01-22 06:56:20,038 - __main__ - INFO - Loading dataset 'rejection_sampling_pgb_clin_herg' from LoaderRegistry... 2026-01-22 06:56:20,039 - data.loaders.sft.rejection_sampling - INFO - Loading trajectories from task: Pgp_Broccatelli 2026-01-22 06:56:20,120 - data.loaders.sft.rejection_sampling - INFO - Loaded 798 examples from Pgp_Broccatelli 2026-01-22 06:56:20,120 - data.loaders.sft.rejection_sampling - INFO - Loading trajectories from task: ClinTox 2026-01-22 06:56:20,233 - data.loaders.sft.rejection_sampling - INFO - Loaded 949 examples from ClinTox 2026-01-22 06:56:20,233 - data.loaders.sft.rejection_sampling - INFO - Loading trajectories from task: hERG 2026-01-22 06:56:20,280 - data.loaders.sft.rejection_sampling - INFO - Loaded 420 examples from hERG 2026-01-22 06:56:20,694 - data.loaders.sft.rejection_sampling - INFO - Total examples after filtering: 2167 2026-01-22 06:56:20,695 - __main__ - INFO - -> Loaded 2167 examples from 'rejection_sampling_pgb_clin_herg' 2026-01-22 06:56:20,740 - __main__ - INFO - Filtered out 3 traces exceeding ~32768 tokens 2026-01-22 06:56:20,740 - __main__ - INFO - Total dataset size: 2164 examples 2026-01-22 06:56:20,740 - __main__ - INFO - Training mode: completion_only 2026-01-22 06:56:20,740 - __main__ - INFO - dataset_text_field=None, completion_only_loss=True, assistant_only_loss=False 2026-01-22 06:57:58,224 - liger_kernel.transformers.monkey_patch - INFO - There are currently no Liger kernels supported for model type: interns1. 2026-01-22 06:57:58,234 - __main__ - INFO - Verifying dataloader integrity... 2026-01-22 06:57:58,235 - __main__ - INFO - # of Batches: 249 2026-01-22 06:58:07,945 - __main__ - INFO - Training batch stats - Avg samples per batch: 8.69, Min: 5, Max: 12 2026-01-22 06:58:07,945 - __main__ - INFO - Starting training... 2026-01-22 07:27:44,000 - __main__ - INFO - Pushing model to HuggingFace Hub: jiosephlee/sft_rejection_sampling_pgb_clin_herg_Intern-s1-mini-distill-dsv32-11k-samples_lr1e-05