File size: 7,078 Bytes
84a3b72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
# MASTER CHECKLIST: WHAT NEEDS TO HAPPEN FOR PART 1

## Files Already Prepared (βœ“ Done)

| File | Purpose | Status |
|------|---------|--------|
| `PART1_QUICK_SUMMARY.md` | 1-page reference guide for venue | βœ“ READY |
| `PART1_DEVELOPMENT_TRAINING_CHECKLIST.md` | Detailed step-by-step instructions | βœ“ READY |
| `generate_curves.py` | Curve generation after training | βœ“ READY |
| `BLOG_POST_TEMPLATE.md` | Storytelling framework | βœ“ READY |
| `training/train.py` | Training script | βœ“ READY |
| `training/config.yaml` | Optimized config (1500 episodes) | βœ“ READY |
| `training/warmup_traces.jsonl` | SFT warmup data (20 examples) | βœ“ READY |
| `permanence/env.py` | Core environment | βœ“ READY |

---

## PART 1: DEVELOPMENT & TRAINING BREAKDOWN

### What Happens in PART 1
**At venue: 11:30 AM - 8:00 PM (8.5 hours)**

PART 1 is about **generating evidence that your environment actually teaches agents something.**

---

## WHAT YOU NEED TO DO (Concrete Tasks)

### PRE-VENUE (Before you leave today)

**Task 1.1: Verify repo is in good state**
```bash
cd c:\Users\Hp\OneDrive\Desktop\meta
git status  # Should show nothing uncommitted
git log -1  # Last commit: "Add OpenEnv deployment files..."
```
Expected: No uncommitted changes, repo clean

**Task 1.2: Verify dependencies are specified**
```bash
cat pyproject.toml | grep -A 10 dependencies
```
Expected: Lists torch, transformers, trl, unsloth, datasets, peft

**Task 1.3: Verify training config is correct**
```bash
cat training/config.yaml
```
Expected: `total_episodes: 1500`, `group_size: 8`, `load_in_4bit: true`

---

### AT VENUE: PHASE 1 (11:30 AM - 12:00 PM) β€” GPU Setup

**Task 2.1: Get GPU access**
- Find venue staff
- Get SSH credentials or Colab link
- **CRITICAL:** Confirm GPU type (A100, RTX 4090, H100, etc.)
- If NO GPU: Escalate immediately to L2 mentor

**Task 2.2: Verify CUDA works**
```bash
python -c "import torch; print(torch.cuda.get_device_name(0)); print(f'{torch.cuda.get_device_properties(0).total_memory / 1e9:.0f}GB')"
```
Expected: Should print GPU name and memory (e.g., "A100" and "40GB")

**Task 2.3: Clone repo and install dependencies**
```bash
git clone https://github.com/chanikkyasaai/permanence
cd permanence
pip install -e .
pip install torch transformers trl unsloth datasets peft
```
Expected: No errors, all packages install successfully

**Task 2.4: Verify environment works**
```bash
python -c "from permanence.env import PermanenceEnv; print('βœ“ OK')"
```
Expected: Prints "βœ“ OK"

**By 12:00 PM: You should have GPU ready, repo cloned, dependencies installed, environment verified.**

---

### AT VENUE: PHASE 2 (12:00 PM - 7:30 PM) β€” Training Execution

**Task 3.1: START TRAINING (single command)**
```bash
python -m training.train --config training/config.yaml
```

**That's it. Press Enter. Training runs for 7 hours unattended.**

**What happens next:**
- Minutes 0-1: Model loading
- Minutes 1-3: Data loading
- Minutes 3-420: Training (1,500 episodes Γ— ~0.17 min/episode)
- Every 100 episodes: Progress printed to console
- Output: `permanence_output/training_log.json` with all metrics

**You can relax, walk around, eat, prepare for Part 2. Just don't close the terminal.**

**Checkpoint:** Every 500 episodes, a checkpoint is saved. If it crashes at episode 1400, you can resume.

---

### AT VENUE: PHASE 3 (7:30 PM - 8:00 PM) β€” Post-Training Verification

**Task 4.1: Generate training curves**
```bash
python generate_curves.py
```
Expected: Creates `results/training_curves.png` (4-panel plot)

**Task 4.2: Verify curves look good**
- Open `results/training_curves.png`
- Check Panel 1 (Reward): Should trend **upward** (from negative to positive)
- Check Panel 2 (Loss): Should trend **downward** (convergence)
- Check Panel 3 (Catastrophe): Should trend **downward** (improvement)
- Check Panel 4 (Accuracy): Should trend **upward** (improvement)

If curves look wrong: Check training_log.json for errors

**Task 4.3: Verify model loads**
```bash
python -c "from transformers import AutoModelForCausalLM; m = AutoModelForCausalLM.from_pretrained('./permanence_output/final_model'); print('βœ“ Model loads')"
```
Expected: Prints "βœ“ Model loads"

**Task 4.4: Commit results**
```bash
git add permanence_output/training_log.json results/training_curves.png results/training_summary.txt
git commit -m "Training complete: 1500 episodes, reward improvement verified"
```
Expected: Commit succeeds, files tracked

**By 8:00 PM: You have training curves, metrics, and proof that the environment works.**

---

## DELIVERABLES AT END OF PART 1

By 8:00 PM, you will have:

```
permanence_output/
β”œβ”€β”€ training_log.json              ← 1,500 episodes of metrics
β”œβ”€β”€ final_model/                   ← Trained weights
β”‚   └── pytorch_model.bin
└── checkpoint_*

results/
β”œβ”€β”€ training_curves.png            ← ⭐ JUDGES WANT THIS
β”œβ”€β”€ training_summary.txt           ← Numerical metrics
└── training_comparison.md

Git commits with all artifacts tracked
```

---

## SUCCESS CRITERIA FOR PART 1

βœ… You've completed PART 1 if:

- [ ] Training ran for 7 hours without crashing
- [ ] permanence_output/training_log.json exists with 1,500 episodes
- [ ] results/training_curves.png exists and shows improvement
- [ ] Reward curve trending upward
- [ ] Catastrophe rate trending downward (from ~43% to <20%)
- [ ] Prediction accuracy trending upward (from ~31% to >50%)
- [ ] Trained model loads successfully
- [ ] All results committed to git

---

## WHAT COMES AFTER PART 1 (PART 2)

Once PART 1 is complete (8:00 PM), you'll have 9 hours until deadline (5:00 PM next day) to do PART 2:

**PART 2 Tasks:**
1. Write mini-blog or record <2min video explaining results
2. Update README with storytelling arc + curve + links
3. Push to HuggingFace Space
4. Update GitHub with final links
5. Submit Google Form

(PART 2 checklist will be provided separately once PART 1 is done)

---

## KEY FACTS

**PART 1 is the bottleneck.** Everything depends on getting GPU training to work.

**Judges explicitly state:** "At minimum, loss and reward plots from a real run."

**Right now:** You have 0/20 on "Training Evidence" criterion. After PART 1: You'll have 7/20.

**The difference:** Disqualification vs. Contention.

**What must happen:** Train for 7 hours, generate curves, commit results.

**Contingency:** If GPU fails, you can still explain the technical architecture to judges. But curves are what wins.

---

## IMMEDIATE NEXT STEPS

### Today (Before Venue):
- [ ] Print or bookmark `PART1_QUICK_SUMMARY.md` (2 pages, reference at venue)
- [ ] Review `PART1_DEVELOPMENT_TRAINING_CHECKLIST.md` (detailed steps)
- [ ] Verify training/config.yaml one more time
- [ ] Make sure laptop has repo cloned locally (backup copy)

### At Venue (11:30 AM):
- [ ] Find GPU
- [ ] Follow PART1_QUICK_SUMMARY.md steps 1-3
- [ ] Start training at 12:00 PM
- [ ] Follow post-training steps at 7:30 PM
- [ ] Curves ready by 8:00 PM

**That's the entire PART 1 plan. Nothing more complicated than that.**