Fix ValidationDataset EOS separator inconsistency

Always appending EOS in ValidationDataset regardless of use_eos_separator
made val_loss incomparable to train_loss when the flag is False.
Now all three dataset classes (Packed, Mixed, Validation) share the same
conditional EOS logic.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Files changed (1) hide show

llm_lab/data/dataset.py +2 -1

llm_lab/data/dataset.py CHANGED Viewed

@@ -205,7 +205,8 @@ class ValidationDataset:
             if not token_ids:
                 continue
-            token_ids.append(self.tokenizer.eos_id)
             buffer.extend(token_ids)
             while len(buffer) >= self.config.max_seq_len + 1 and count < self.num_samples:

             if not token_ids:
                 continue
+            if self.config.use_eos_separator:
+                token_ids.append(self.tokenizer.eos_id)
             buffer.extend(token_ids)
             while len(buffer) >= self.config.max_seq_len + 1 and count < self.num_samples: