Commit History

Upload folder using huggingface_hub
a7c990b
verified

A-HK commited on

Update README: add Round 2 generalization results, procedural eval data, train_continuation.py
45ae1a6
verified

A-HK commited on

Rewrite BLOG.md: add Round 2 generalization results, full technical depth, honest limitations
62732ff
verified

A-HK commited on

Upload folder using huggingface_hub
e43736b
verified

A-HK commited on

Upload folder using huggingface_hub
97524be
verified

A-HK commited on

Add continuation training script: procedural scenarios, generalization eval
4bbb603
verified

A-HK commited on

Rewrite README with full technical implementation details, research references, real training numbers
9e217f9
verified

A-HK commited on

Upload folder using huggingface_hub
98dce4a
verified

A-HK commited on

Upload folder using huggingface_hub
23d9813
verified

A-HK commited on

Clean up: remove build artifacts, debug logs, old notebook, internal docs
c4e8937
verified

A-HK commited on

Upload folder using huggingface_hub
6c62fb1
verified

A-HK commited on

Fix plot crash: filter None values from metrics before np.convolve
4728f59
verified

A-HK commited on

Upload artifacts/baseline_vs_trained.png with huggingface_hub
df3e8b5
verified

A-HK commited on

Upload artifacts/training_health.png with huggingface_hub
e695c2f
verified

A-HK commited on

Upload artifacts/loss_curve.png with huggingface_hub
b6d4d4d
verified

A-HK commited on

Upload artifacts/reward_curve.png with huggingface_hub
075873a
verified

A-HK commited on

Fix plot NoneType bug in rewards/entropy lists"
b0e5cb2
verified

A-HK commited on

Redesigned train_job.py: fix reward signal (100x), DAPO-style GRPO, curriculum learning
daceb77
verified

A-HK commited on

Upload redesigned train_job.py header (full upload via job)
5e7e929
verified

A-HK commited on

Upload folder using huggingface_hub
280be27
verified

A-HK commited on

Upload folder using huggingface_hub
9a03233
verified

A-HK commited on

Upload folder using huggingface_hub
22c2a42
verified

A-HK commited on

Upload folder using huggingface_hub
6ff212f
verified

A-HK commited on

Upload folder using huggingface_hub
589f1cb
verified

A-HK commited on

Delete round2-hackathon-spec.txt
ccc27e0
verified

A-HK commited on

Delete OpenEnv Hackathon - Provided Resources and Guidelines.pdf
8abfce0
verified

A-HK commited on

Add anti-reward-hacking test suite: scan_logs spam, false positive exploitation, conclude gaming, repeat action exploit"
8a72c51
verified

A-HK commited on

Fix train_grpo.py: rename SFT path, add actual GRPO CLI entrypoint with wired-in procedural gen + trajectory filter + adversarial attacker"
5214eea
verified

A-HK commited on

Upload folder using huggingface_hub
f876b60
verified

A-HK commited on

Rewrite README as clear project blog with all features documented"
95852d4
verified

A-HK commited on

Fix: set response_schema for Qwen2.5 tool-calling + add jmespath dep
598c4c8
verified

A-HK commited on

Remove temp file
dd45c68
verified

A-HK commited on

Fix: remove max_prompt_length (removed in TRL 1.x, Unsloth rejects it)
93ce912
verified

A-HK commited on

Remove max_prompt_length from GRPOConfig (removed in TRL 1.x, Unsloth rejects it)"
f6cacba
verified

A-HK commited on

Fix dtype='auto' -> None for Unsloth compatibility (T4 requires float16, Unsloth auto-detects with None)"
62e0566
verified

A-HK commited on

Fix total_mem bug + integrate procedural gen, security APIs, StarPO-S, adversarial attacker into notebook"
2b7135f
verified

A-HK commited on

Add adversarial attacker for self-play curriculum\n\nReference: SPIRAL (arXiv 2506.24119)\nAdaptive attacker that adds decoys, obscures IOCs, tightens deadlines,\nand exploits defender blind spots. Creates automatic difficulty scaling."
93a36f8
verified

A-HK commited on

Add StarPO-S trajectory filtering for GRPO stability\n\nReference: RAGEN (arXiv 2504.20073)\nFilters to keep only informative rollouts, preventing echo trap collapse."
60fa6a8
verified

A-HK commited on

Add real-world security API integrations (OSV.dev, deps.dev, GHSA, npm)\n\nFree, no-auth-required live lookups against security databases.\nAdds genuine information-gathering vs time-pressure tradeoff."
581cdcc
verified

A-HK commited on

Add procedural scenario generation (infinite training variation)\n\nGenerates unique supply-chain incidents from parameterized distributions.\nReference: Self-Evolving Curriculum (arXiv 2505.14970)"
1053202
verified

A-HK commited on

Replace notebook with complete GRPO training pipeline (TRL environment_factory + Unsloth QLoRA)
5d6d233
verified

A-HK commited on

Upload folder using huggingface_hub
4f73b89
verified

A-HK commited on

Upload folder using huggingface_hub
bccfde3
verified

A-HK commited on

Upload folder using huggingface_hub
cd43a41
verified

A-HK commited on

Upload folder using huggingface_hub
60f4610
verified

A-HK commited on

Upload folder using huggingface_hub
b25217e
verified

A-HK commited on

Upload folder using huggingface_hub
26bb614
verified

A-HK commited on

Upload folder using huggingface_hub
193594b
verified

A-HK commited on

Upload folder using huggingface_hub
580d6c5
verified

A-HK commited on

Upload folder using huggingface_hub
5080156
verified

A-HK commited on