schemashift / training

Commit History

Phase 13 Stage 1 v2: fix termination signal - reward penalties for non-termination, reduced max_tokens, stop_strings
5a432bb

yashash04 commited on

Phase 13 Stage 1: pivot to winner's pattern - TRL 0.29.0 + fast_inference=False + DAPO loss
cf4ce7e

yashash04 commited on

Phase 13 Stage 1: fix Unicode encoding artifacts in notebook comments
b0a62f6

yashash04 commited on

Phase 13 Stage 1: drop vLLM, use Kaggle native torch 2.10, upgrade trl to 0.18+ (Kaggle dep compatibility)
7247cb4

yashash04 commited on

Phase 13 Stage 1: apply all 7 notebook audit fixes (account labels + health check + hub namespace + explicit commenting)
dec6785

yashash04 commited on

Phase 13 Stage 1 prep: Kaggle runbook + Run 1 config pre-populated (notebook audit flags pending approval)
453c504

yashash04 commited on

Phase 8: Env client + GRPO training skeleton (Kaggle notebook)
2464e9e

yashash04 commited on

first commit
d4ab0f1

yashash04 commited on