| # Phase 3: IPO Triple μΆμΆ μμ€ν μλ£ | |
| ## β ꡬνλ μ»΄ν¬λνΈ | |
| ### 1. IPOTripleExtractor | |
| - **νμΌ**: `absolute_zero_reasoner/testtime/ipo_extractor.py` | |
| - **κΈ°λ₯**: | |
| - AZR Python Executor κΈ°λ° μμ ν μ½λ μ€ν | |
| - ν μ€νΈ μΌμ΄μ€μμ μ λ ₯-μΆλ ₯ μ μΆμΆ | |
| - μ루μ μ€νμΌλ‘ IPO νΈλ¦¬ν μμ± | |
| - ν©μ± μ λ ₯μΌλ‘ μΆκ° νΈλ¦¬ν μμ± | |
| - νΈλ¦¬ν κ²μ¦ λ° μΌκ΄μ± νμΈ | |
| - **κΈ°λ°**: `python_executor.py`, `azr_ray_trainer.py` λ‘μ§ | |
| ### 2. TestTimeTaskGenerator | |
| - **νμΌ**: `absolute_zero_reasoner/testtime/task_generator.py` | |
| - **κΈ°λ₯**: | |
| - Induction: μ λ ₯-μΆλ ₯μμ ν¨μ μΆλ‘ | |
| - Deduction: ν¨μ+μ λ ₯μμ μΆλ ₯ μΆλ‘ | |
| - Abduction: ν¨μ+μΆλ ₯μμ μ λ ₯ μΆλ‘ | |
| - AZR κΈ°λ° ν νλ¦Ώ μμ€ν | |
| - νμ΅μ© λ°μ΄ν°μ μμ± | |
| - **κΈ°λ°**: `prompts.py`, `constructor.py` ν νλ¦Ώ | |
| ## π§ͺ ν μ€νΈ κ²°κ³Ό | |
| ### IPO μΆμΆ μμ€ν ν μ€νΈ (β 3/3 ν΅κ³Ό) | |
| ``` | |
| IPO Extractor: β PASS | |
| Task Generator: β PASS | |
| Integrated Pipeline: β PASS | |
| ``` | |
| ### κ²μ¦λ κΈ°λ₯ | |
| - β **IPO μΆμΆ**: 5/6 μ ν¨ν νΈλ¦¬ν μμ± | |
| - β **νμ€ν¬ μμ±**: 4κ° νμ€ν¬ (I:1, D:1, A:2) | |
| - β **ν΅ν© νμ΄νλΌμΈ**: Mbpp/2 λ¬Έμ μ 체 μ²λ¦¬ | |
| - β **AZR Python Executor**: μμ ν μ½λ μ€ν νμΈ | |
| ## π μ±λ₯ μ§ν | |
| ### IPO μΆμΆ μ±λ₯ | |
| - **ν μ€νΈ λ¬Έμ **: `add_two(x)` κ°λ¨ν ν¨μ | |
| - **μΆμΆλ νΈλ¦¬ν**: 5κ° (μ ν¨μ± 83%) | |
| - **μ€ν μκ°**: ~0.5μ΄ | |
| ### νμ€ν¬ μμ± μ±λ₯ | |
| - **MBPP λ¬Έμ **: `similar_elements` ν¨μ | |
| - **μμ±λ νμ€ν¬**: 4κ° (κ· λ± λΆλ°°) | |
| - **νμ€ν¬ λΆν¬**: Induction(25%), Deduction(25%), Abduction(50%) | |
| ### ν΅ν© νμ΄νλΌμΈ | |
| ``` | |
| 1. λ¬Έμ λ‘λ© β β 2. IPO μΆμΆ β β 3. νμ€ν¬ μμ± β | |
| ``` | |
| ## π ν΅μ¬ κΈ°μ κ²μ¦ | |
| ### 1. AZR Python Executor μ°λ | |
| - **ProcessPool κΈ°λ°**: μμ ν μλλ°μ€ μ€ν | |
| - **νμμμ κ΄λ¦¬**: 5μ΄ μ νμΌλ‘ TestTime μ΅μ ν | |
| - **μλ¬ μ²λ¦¬**: ꡬ문/μ€ν μ€λ₯ λΆλ¦¬ μ²λ¦¬ | |
| ### 2. IPO νΈλ¦¬ν ꡬ쑰 | |
| ```json | |
| { | |
| "id": "Mbpp/2_triple_0", | |
| "input": "(3, 4, 5, 6), (5, 7, 4, 10)", | |
| "program": "def similar_elements(test_tup1, test_tup2):\n return tuple(set(test_tup1) & set(test_tup2))", | |
| "expected_output": "(4, 5)", | |
| "actual_output": "(4, 5)", | |
| "function_name": "similar_elements", | |
| "is_correct": true, | |
| "extraction_method": "test_case" | |
| } | |
| ``` | |
| ### 3. 3μ’ νμ€ν¬ ν νλ¦Ώ | |
| - **Induction**: "μ λ ₯-μΆλ ₯μμ ν¨μλ₯Ό μΆλ‘ νμΈμ" | |
| - **Deduction**: "ν¨μμ μ λ ₯μΌλ‘ μΆλ ₯μ μμΈ‘νμΈμ" | |
| - **Abduction**: "ν¨μμ μΆλ ₯μΌλ‘ μ λ ₯μ μ°ΎμΌμΈμ" | |
| ## π μ λ°μ΄νΈλ ꡬ쑰 | |
| ``` | |
| TestTime-RLVR-v2/absolute_zero_reasoner/testtime/ | |
| βββ __init__.py # β IPO, Task μΆκ° | |
| βββ config.py # β μλ£ | |
| βββ benchmark_loader.py # β μλ£ | |
| βββ solution_generator.py # β μλ£ | |
| βββ ipo_extractor.py # π IPO μΆμΆ μμ€ν | |
| βββ task_generator.py # π 3μ’ νμ€ν¬ μμ± | |
| βββ logger.py # β μλ£ | |
| ``` | |
| ## π λ‘κΉ μμ€ν νμ© | |
| ### μꡬμ¬ν μ€μ νμΈ | |
| - β **μꡬμ¬ν 2**: IPO μΆμΆ + νμ€ν¬ μμ± λ‘κ·Έ κΈ°λ‘ | |
| - β **ꡬ쑰νλ λ‘κ·Έ**: JSON ννλ‘ `/tmp/azr/logs/` μ μ₯ | |
| - β **μ€μκ° λͺ¨λν°λ§**: μΆμΆ/μμ± κ³Όμ λ¨κ³λ³ μΆμ | |
| ### λ‘κ·Έ μΉ΄ν κ³ λ¦¬ | |
| ``` | |
| logs/ | |
| βββ ipo_extraction/ # IPO μΆμΆ μμΈ λ‘κ·Έ | |
| βββ task_generation/ # νμ€ν¬ μμ± λ‘κ·Έ | |
| βββ problems/ # λ¬Έμ λ³ μ²λ¦¬ λ‘κ·Έ | |
| βββ training/ # ν₯ν νμ΅ λ‘κ·Έμ© | |
| ``` | |
| ## π― λ€μ λ¨κ³ (Phase 4) | |
| Phase 4μμ ꡬνν **RLVR νμ΅ μμ€ν **: | |
| 1. **TestTimeRewardManager** - AZR reward_managers.py κΈ°λ° | |
| 2. **TestTimeRLVRTrainer** - AZR PPO/REINFORCE++ νμ© | |
| 3. **μ±λ₯ νκ° μμ€ν ** - λ°λ³΅ νμ΅ ν¨κ³Ό μΈ‘μ | |
| ### AZR μ»΄ν¬λνΈ νμ© κ³ν | |
| - `rewards/reward_managers.py` - r_solve ν¨μ νμ© | |
| - `trainer/ppo/reason_rl_ray_trainer.py` - PPO νμ΅ λ‘μ§ | |
| - veRL νλ μμν¬ ν΅ν© | |
| --- | |
| **μμ± μΌμ**: 2025-07-16 | |
| **μν**: β μλ£ | |
| **ν μ€νΈ**: β ν΅κ³Ό (3/3) | |
| **ν΅μ¬ μ±κ³Ό**: AZR Python Executor μ±κ³΅μ μ°λ, μμ ν IPO νμ΄νλΌμΈ κ΅¬μΆ |