| # Phase 1: κΈ°λ° μΈνλΌ κ΅¬μΆ μλ£ | |
| ## π λλ ν 리 ꡬ쑰 μ€μ | |
| ### μλ‘ μμ±λ νλ‘μ νΈ | |
| - `/home/ubuntu/RLVR/TestTime-RLVR-v2/` - AZR κΈ°λ° μ νλ‘μ νΈ | |
| ### ν΅μ¬ λλ ν 리 ꡬ쑰 | |
| ``` | |
| TestTime-RLVR-v2/ | |
| βββ absolute_zero_reasoner/ | |
| β βββ testtime/ # TestTime μ μ© μ»΄ν¬λνΈ | |
| β β βββ __init__.py # λͺ¨λ μ΄κΈ°ν | |
| β β βββ config.py # TestTime μ€μ | |
| β βββ utils/code_utils/ # AZR Python Executor (κΈ°μ‘΄) | |
| β βββ rewards/ # AZR Reward Manager (κΈ°μ‘΄) | |
| β βββ trainer/ppo/ # AZR PPO Trainer (κΈ°μ‘΄) | |
| βββ logs/ # λ‘κΉ μμ€ν | |
| β βββ problems/ # λ¬Έμ λ³ λ‘κ·Έ | |
| β βββ ipo_extraction/ # IPO μΆμΆ λ‘κ·Έ | |
| β βββ task_generation/ # νμ€ν¬ μμ± λ‘κ·Έ | |
| β βββ training/ # νμ΅ λ‘κ·Έ | |
| β βββ performance/ # μ±λ₯ λ³ν λ‘κ·Έ | |
| βββ evaluation/code_eval/data/ # λ²€μΉλ§ν¬ λ°μ΄ν° | |
| β βββ HumanEvalPlus.jsonl # β μ‘΄μ¬ νμΈ | |
| β βββ MbppPlus.jsonl # β μ‘΄μ¬ νμΈ | |
| βββ Update/ # λ³κ²½μ¬ν μΆμ | |
| ``` | |
| ## π§ μμ±λ ν΅μ¬ μ»΄ν¬λνΈ | |
| ### 1. TestTimeConfig ν΄λμ€ | |
| - **μμΉ**: `absolute_zero_reasoner/testtime/config.py` | |
| - **κΈ°λ₯**: TestTime RLVR μ 체 μ€μ κ΄λ¦¬ | |
| - **νΉμ§**: AZR νΈνμ± μ μ§νλ©΄μ TestTime νΉν μ€μ μΆκ° | |
| ### 2. BenchmarkConfig ν΄λμ€ | |
| - **μμΉ**: `absolute_zero_reasoner/testtime/config.py` | |
| - **κΈ°λ₯**: λ²€μΉλ§ν¬λ³ μ€μ (HumanEval+, MBPP+) | |
| - **νΉμ§**: λ²€μΉλ§ν¬λ³ μμ μΈλ±μ€, κ²½λ‘ λ± κ΄λ¦¬ | |
| ## β μλ£λ μμ | |
| 1. **νλ‘μ νΈ λ³΅μ¬**: AZR β TestTime-RLVR-v2 | |
| 2. **λλ ν 리 ꡬ쑰**: λ‘κ·Έ λ° μ»΄ν¬λνΈ λλ ν 리 μμ± | |
| 3. **κΈ°λ³Έ μ€μ **: TestTimeConfig, BenchmarkConfig ν΄λμ€ μμ± | |
| 4. **λ°μ΄ν° νμΈ**: HumanEval+, MBPP+ λ°μ΄ν° νμΌ μ‘΄μ¬ νμΈ | |
| 5. **λͺ¨λ ꡬ쑰**: testtime ν¨ν€μ§ μ΄κΈ°ν | |
| ## π― λ€μ λ¨κ³ (Phase 2) | |
| 1. **BenchmarkProblemLoader** ꡬν - λ²€μΉλ§ν¬ λ¬Έμ λ‘λ© | |
| 2. **InitialSolutionGenerator** ꡬν - μ΄κΈ° μ루μ μμ± | |
| 3. **λ²€μΉλ§ν¬ κ²μ¦ μμ€ν ** ꡬν - μ루μ μ νμ± κ²μ¦ | |
| ## π μ£Όμ μ€κ³ μμΉ | |
| - **AZR νΈνμ±**: κΈ°μ‘΄ AZR μ»΄ν¬λνΈ μ΅λν μ¬μ¬μ© | |
| - **κ²½λν**: TestTimeμ μ ν©ν λΉ λ₯Έ μ μ νμ΅ | |
| - **ν¬κ΄μ λ‘κΉ **: λͺ¨λ λ¨κ³λ³ μμΈ λ‘κ·Έ κΈ°λ‘ | |
| - **λͺ¨λμ±**: κ° μ»΄ν¬λνΈ λ 립μ ν μ€νΈ κ°λ₯ | |
| --- | |
| **μμ± μΌμ**: 2025-07-16 | |
| **μν**: β μλ£ |