jiosephlee's picture
jiosephlee/sft_rejection_sampling_pgb_clin_herg_Intern-s1-mini-distill-dsv32-11k-samples_lr1e-05
a958d1b verified
2026-01-22 06:56:15,580 - __main__ - INFO - Loading model: Kiria-Nozan/Intern-s1-mini-distill-dsv32-11k-samples
2026-01-22 06:56:15,580 - __main__ - INFO - Output directory: /vast/home/j/jojolee/therapeutic-tuning/results/sft/rejection_sampling_pgb_clin_herg/sft_rejection_sampling_pgb_clin_herg_Intern-s1-mini-distill-dsv32-11k-samples_lr1e-05/2026-01-22_06-56
2026-01-22 06:56:15,580 - __main__ - INFO - Datasets: ['rejection_sampling_pgb_clin_herg']
2026-01-22 06:56:16,052 - __main__ - INFO - Loading model 'Kiria-Nozan/Intern-s1-mini-distill-dsv32-11k-samples' with attn_implementation='flash_attention_2'
2026-01-22 06:56:16,795 - accelerate.utils.modeling - INFO - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
2026-01-22 06:56:20,038 - __main__ - INFO - Loading dataset 'rejection_sampling_pgb_clin_herg' from LoaderRegistry...
2026-01-22 06:56:20,039 - data.loaders.sft.rejection_sampling - INFO - Loading trajectories from task: Pgp_Broccatelli
2026-01-22 06:56:20,120 - data.loaders.sft.rejection_sampling - INFO - Loaded 798 examples from Pgp_Broccatelli
2026-01-22 06:56:20,120 - data.loaders.sft.rejection_sampling - INFO - Loading trajectories from task: ClinTox
2026-01-22 06:56:20,233 - data.loaders.sft.rejection_sampling - INFO - Loaded 949 examples from ClinTox
2026-01-22 06:56:20,233 - data.loaders.sft.rejection_sampling - INFO - Loading trajectories from task: hERG
2026-01-22 06:56:20,280 - data.loaders.sft.rejection_sampling - INFO - Loaded 420 examples from hERG
2026-01-22 06:56:20,694 - data.loaders.sft.rejection_sampling - INFO - Total examples after filtering: 2167
2026-01-22 06:56:20,695 - __main__ - INFO - -> Loaded 2167 examples from 'rejection_sampling_pgb_clin_herg'
2026-01-22 06:56:20,740 - __main__ - INFO - Filtered out 3 traces exceeding ~32768 tokens
2026-01-22 06:56:20,740 - __main__ - INFO - Total dataset size: 2164 examples
2026-01-22 06:56:20,740 - __main__ - INFO - Training mode: completion_only
2026-01-22 06:56:20,740 - __main__ - INFO - dataset_text_field=None, completion_only_loss=True, assistant_only_loss=False
2026-01-22 06:57:58,224 - liger_kernel.transformers.monkey_patch - INFO - There are currently no Liger kernels supported for model type: interns1.
2026-01-22 06:57:58,234 - __main__ - INFO - Verifying dataloader integrity...
2026-01-22 06:57:58,235 - __main__ - INFO - # of Batches: 249
2026-01-22 06:58:07,945 - __main__ - INFO - Training batch stats - Avg samples per batch: 8.69, Min: 5, Max: 12
2026-01-22 06:58:07,945 - __main__ - INFO - Starting training...
2026-01-22 07:27:44,000 - __main__ - INFO - Pushing model to HuggingFace Hub: jiosephlee/sft_rejection_sampling_pgb_clin_herg_Intern-s1-mini-distill-dsv32-11k-samples_lr1e-05