AgentnessBench / docs /USAGE-ko.md
irregular6612's picture
docs: Korean command usage guide (docs/USAGE-ko.md) + README pointer
51ee471
|
Raw
History Blame Contribute Delete
9.23 kB
# PROTEUS ๋ช…๋ น์–ด ์‚ฌ์šฉ๋ฒ• (ํ•œ๊ตญ์–ด)
LLM์ด ๋‹ค๋ฅธ ์—์ด์ „ํŠธ์˜ **๋™๊ธฐ๋ฅผ ์ฝ๋Š”์ง€**๋ฅผ ์ธก์ •ํ•˜๋Š” ๊ทธ๋ฆฌ๋“œ ์•„๋ ˆ๋‚˜, PROTEUS์˜ ๋ช…๋ น์–ด ์•ˆ๋‚ด์„œ์ž…๋‹ˆ๋‹ค.
์ฒ˜์Œ ์“ฐ์‹œ๋Š” ๋ถ„๋„ ๋”ฐ๋ผ์˜ฌ ์ˆ˜ ์žˆ๊ฒŒ ์ˆœ์„œ๋Œ€๋กœ ์ •๋ฆฌํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋“  ๋ช…๋ น์€ `proteus` CLI๋กœ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.
> **ํ•œ ์ค„ ์š”์•ฝ**: ๊ฒŒ์ž„์„ LLM์ด๋‚˜ ์‚ฌ๋žŒ์ด ํ”Œ๋ ˆ์ด โ†’ ๊ทธ ๊ธฐ๋ก(trace)์„ JSONL๋กœ ์ €์žฅ โ†’ ํ…์ŠคํŠธยทํŠธ๋ฃจ์ปฌ๋ŸฌยทPNGยทGIF๋กœ ๋‹ค์‹œ ๋ณด๊ธฐ โ†’ ์—ฌ๋Ÿฌ ๊ธฐ๋ก์„ ๋ชจ์•„ ๋น„๊ต.
---
## 0. ์‚ฌ์ „ ์ค€๋น„ โ€” ๊ฐ€์ƒํ™˜๊ฒฝ(.venv)
์ด ํ”„๋กœ์ ํŠธ๋Š” `python`์ด PATH์— ์—†๊ณ , ์ €์žฅ์†Œ ์•ˆ์˜ **`.venv`** ๋ฅผ ์ง์ ‘ ๊ฐ€๋ฆฌ์ผœ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.
๋ชจ๋“  ๋ช…๋ น์€ `python` ๋Œ€์‹  **`.venv/bin/python`** ์œผ๋กœ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค.
```bash
# ์ž‘์—… ๋””๋ ‰ํ„ฐ๋ฆฌ์—์„œ
.venv/bin/python -m proteus --help
```
`.venv`๊ฐ€ ์—†๋‹ค๋ฉด(์ƒˆ ๋จธ์‹  ๋“ฑ) ์ด๋ ‡๊ฒŒ ๋‹ค์‹œ ๋งŒ๋“ญ๋‹ˆ๋‹ค:
```bash
uv venv --python 3.12 .venv
uv pip install --python .venv/bin/python \
"pydantic>=2" "numpy>=1.26" "pyyaml>=6" "pytest>=8" "matplotlib>=3.8"
```
> ์ฐธ๊ณ : LLM provider SDK(openai/anthropic ๋“ฑ)๋Š” **์ผ๋ถ€๋Ÿฌ `.venv`์— ์„ค์น˜ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค**(์˜คํ”„๋ผ์ธ ๋ถˆ๋ณ€).
> ์‹ค์ œ ๋ชจ๋ธ์„ ๋Œ๋ฆด ๋•Œ๋งŒ ๋ณ„๋„ ์ž„์‹œ venv๋ฅผ ์”๋‹ˆ๋‹ค(๋งจ ์•„๋ž˜ "์‹ค์ œ ๋ชจ๋ธ๋กœ ์‹คํ–‰" ์ฐธ๊ณ ).
---
## 1. ์‹œ๋‚˜๋ฆฌ์˜ค ๋ชฉ๋ก ๋ณด๊ธฐ โ€” `list-scenarios`
๋“ฑ๋ก๋œ ์‹œ๋‚˜๋ฆฌ์˜ค ์ด๋ฆ„์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
```bash
.venv/bin/python -m proteus list-scenarios
```
ํ˜„์žฌ๋Š” `predator_evade`(ํฌ์‹์ž ํšŒํ”ผ = ์ƒ์กด ๋™๊ธฐ) ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค.
---
## 2. ์‚ฌ๋žŒ์ด ์ง์ ‘ ํ”Œ๋ ˆ์ด โ€” `play`
๊ฐ™์€ ๊ฒŒ์ž„์„ **์‚ฌ๋žŒ์ด ํ‚ค๋ณด๋“œ๋กœ** ํ”Œ๋ ˆ์ดํ•ฉ๋‹ˆ๋‹ค. LLM๊ณผ **์™„์ „ํžˆ ๋™์ผํ•œ ํ™”๋ฉด(ASCII)** ์„ ๋ณด๋ฉฐ ๋‘๋ฏ€๋กœ,
์‚ฌ๋žŒ ๊ธฐ๋ก๊ณผ LLM ๊ธฐ๋ก์„ ๊ณต์ •ํ•˜๊ฒŒ ๋น„๊ตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
```bash
.venv/bin/python -m proteus play \
--scenario predator_evade \
--difficulty easy \
--seed 42 \
--play-turns 10 \
--out runs/me.jsonl
```
- ๋งค ํ„ด `up` / `down` / `left` / `right` / `stay` ์ค‘ ํ•˜๋‚˜๋ฅผ ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค. (`w/a/s/d` ๋‹จ์ถ•ํ‚ค๋„ ๋จ, ๋Œ€์†Œ๋ฌธ์žยท๊ณต๋ฐฑ ๋ฌด์‹œ)
- `up`์€ ํ–‰(row)์„ ์ค„์ด๊ณ , `down`์€ ๋Š˜๋ฆฝ๋‹ˆ๋‹ค. `left/right`๋Š” ์—ด(column)์„ ๋”ฐ๋ผ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค.
- ์ž˜๋ชป ์ž…๋ ฅํ•˜๋ฉด ๋‹ค์‹œ ๋ฌผ์–ด๋ด…๋‹ˆ๋‹ค.
**์ฃผ์š” ์˜ต์…˜**
| ์˜ต์…˜ | ์„ค๋ช… | ๊ธฐ๋ณธ๊ฐ’ |
|------|------|--------|
| `--scenario` | ์‹œ๋‚˜๋ฆฌ์˜ค ์ด๋ฆ„ | `predator_evade` |
| `--difficulty` | ๋‚œ์ด๋„: `easy` / `medium` / `hard` / `expert` | `easy` |
| `--seed` | ์›”๋“œ๋ฅผ ๊ณ ์ •ํ•˜๋Š” ์‹œ๋“œ(๊ฐ™์€ ์‹œ๋“œ = ๊ฐ™์€ ๋งต) | ์—†์Œ |
| `--play-turns` | ํ”Œ๋ ˆ์ดํ•  ํ„ด ์ˆ˜(์ƒ์กด ์˜ˆ์‚ฐ) | `15` |
| `--probe` | ํ„ด๋งˆ๋‹ค ์ดํ•ด๋„ ์งˆ๋ฌธ(probe)๋„ ๋ฐ›๊ธฐ | ๊บผ์ง |
| `--out` | ๊ธฐ๋ก์„ ์ €์žฅํ•  JSONL ๊ฒฝ๋กœ(์ƒ๋žต ๊ฐ€๋Šฅ) | ์ €์žฅ ์•ˆ ํ•จ |
> **ํŒ**: ํŒŒ์ดํ”„๋กœ ์ž…๋ ฅ์„ ๋ฏธ๋ฆฌ ๋„ฃ์–ด ์ž๋™ ํ”Œ๋ ˆ์ด๋„ ๋ฉ๋‹ˆ๋‹ค.
> `printf 'up\nup\nleft\n' | .venv/bin/python -m proteus play --scenario predator_evade --seed 42 --play-turns 3 --out runs/me.jsonl`
---
## 3. LLM์ด ํ”Œ๋ ˆ์ด โ€” `run`
์ง€์ •ํ•œ ๋ชจ๋ธ์ด ๊ฒŒ์ž„์„ ํ”Œ๋ ˆ์ดํ•˜๊ณ  ๊ธฐ๋ก์„ ๋‚จ๊น๋‹ˆ๋‹ค. `--out`์€ **ํ•„์ˆ˜**์ž…๋‹ˆ๋‹ค.
```bash
# ์˜คํ”„๋ผ์ธ ์Šค๋ชจํฌ(๊ฐ€์งœ ๋ชจ๋ธ) โ€” ๋„คํŠธ์›Œํฌ ๋ถˆํ•„์š”
.venv/bin/python -m proteus run \
--scenario predator_evade \
--model fake:demo \
--difficulty easy \
--seed 42 \
--play-turns 10 \
--out runs/llm.jsonl
```
- `--model`์€ `์ด๋ฆ„:๋ชจ๋ธ` ํ˜•์‹์ž…๋‹ˆ๋‹ค. **`fake:<์•„๋ฌด์ด๋ฆ„>`** ์€ ์˜คํ”„๋ผ์ธ ๊ฐ€์งœ ๋ชจ๋ธ(ํ…Œ์ŠคํŠธยท๋ฐ๋ชจ์šฉ).
- ์‹ค์ œ ๋ชจ๋ธ(openai/anthropic/gemini/ollama ๋“ฑ)์€ ๋งจ ์•„๋ž˜ ์ ˆ์„ ์ฐธ๊ณ ํ•˜์„ธ์š”.
- ๋๋‚˜๋ฉด ๊ฒฐ๊ณผ ์š”์•ฝ(์ƒ์กด/ํฌํš, motive_reading_accuracy, reactivity_index)์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
**์ฃผ์š” ์˜ต์…˜**(`play`์™€ ๊ฑฐ์˜ ๋™์ผ)
| ์˜ต์…˜ | ์„ค๋ช… | ๊ธฐ๋ณธ๊ฐ’ |
|------|------|--------|
| `--model` | provider ์ŠคํŽ™ `์ด๋ฆ„:๋ชจ๋ธ` **(ํ•„์ˆ˜)** | โ€” |
| `--no-probe` | ํ„ด๋งˆ๋‹ค probe ์งˆ๋ฌธ์„ **๋„๊ธฐ** | probe ์ผœ์ง |
| `--out` | ๊ธฐ๋ก JSONL ๊ฒฝ๋กœ **(ํ•„์ˆ˜)** | โ€” |
> `run`์€ probe๊ฐ€ **๊ธฐ๋ณธ ์ผœ์ง**, `play`๋Š” **๊ธฐ๋ณธ ๊บผ์ง**์ž…๋‹ˆ๋‹ค(์‚ฌ๋žŒ์—๊ฒ ๋งค ํ„ด ์งˆ๋ฌธ์ด ๋ฒˆ๊ฑฐ๋กœ์šฐ๋ฏ€๋กœ).
---
## 4. ๊ธฐ๋ก ๋‹ค์‹œ ๋ณด๊ธฐ โ€” `replay`
์ €์žฅํ•œ trace(JSONL)๋ฅผ ์—ฌ๋Ÿฌ ๋ฐฉ์‹์œผ๋กœ ๋‹ค์‹œ ๋ด…๋‹ˆ๋‹ค.
### 4-1. ํ…์ŠคํŠธ(๊ธฐ๋ณธ)
ํ„ด๋ณ„ ํ–‰๋™ vs ๋™๊ธฐ(motive)/์Šต๊ด€(habit), ์ •๋‹ต ์—ฌ๋ถ€, ๋ณด์ƒ, ๊ทธ๋ฆฌ๊ณ  ๋ฉ”ํŠธ๋ฆญ์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
```bash
.venv/bin/python -m proteus replay runs/me.jsonl
```
### 4-2. ํŠธ๋ฃจ์ปฌ๋Ÿฌ ํ„ฐ๋ฏธ๋„ ์žฌ์ƒ โ€” `--visual`
๊ทธ๋ฆฌ๋“œ๋ฅผ 24๋น„ํŠธ ์ปฌ๋Ÿฌ ๋ธ”๋ก์œผ๋กœ ๊ทธ๋ฆฌ๊ณ , ์˜†์— ํ–‰๋™/๋™๊ธฐ/์Šต๊ด€/๋ณด์ƒ/ํ† ํฐ/์ถ”๋ก  ํŒจ๋„์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
```bash
.venv/bin/python -m proteus replay runs/me.jsonl --visual --fps 0
```
- `--fps 0` : ํ”„๋ ˆ์ž„๋งˆ๋‹ค ๋ฉˆ์ถค(์—”ํ„ฐ๋กœ ์ง„ํ–‰)
- `--fps 2` : ์ดˆ๋‹น 2ํ”„๋ ˆ์ž„ ์ž๋™ ์žฌ์ƒ (๊ธฐ๋ณธ๊ฐ’์€ 4)
### 4-3. PNG ํ”„๋ ˆ์ž„์œผ๋กœ ์ €์žฅ โ€” `--png DIR`
ํ”„๋ ˆ์ž„๋ณ„ `frame_000.png`, `frame_001.png` โ€ฆ ๋ฅผ ํด๋”์— ์”๋‹ˆ๋‹ค.
```bash
.venv/bin/python -m proteus replay runs/me.jsonl --png runs/me_frames
```
---
## 5. PNG โ†’ GIF๋กœ ํ•ฉ์น˜๊ธฐ
CLI์— GIF ์ถœ๋ ฅ์€ ์—†์ง€๋งŒ, `--png`๋กœ ๋ฝ‘์€ ํ”„๋ ˆ์ž„์„ Pillow(์ด๋ฏธ `.venv`์— ์žˆ์Œ)๋กœ ํ•ฉ์น˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.
```bash
# 1) ํ”„๋ ˆ์ž„ ์ƒ์„ฑ
.venv/bin/python -m proteus replay runs/me.jsonl --png runs/me_frames
# 2) GIF๋กœ ํ•ฉ์น˜๊ธฐ
.venv/bin/python - <<'PY'
from pathlib import Path
from PIL import Image
frames = sorted(Path("runs/me_frames").glob("frame_*.png"))
imgs = [Image.open(p).convert("RGBA") for p in frames]
# ํฐ ๋ฐฐ๊ฒฝ์— ํ•ฉ์„ฑ ํ›„ ํŒ”๋ ˆํŠธ ๋ณ€ํ™˜(GIF๋Š” ์•ŒํŒŒ๊ฐ€ ์—†์Œ)
flat = []
for im in imgs:
bg = Image.new("RGBA", im.size, (255, 255, 255, 255))
flat.append(Image.alpha_composite(bg, im).convert("P", palette=Image.ADAPTIVE))
durations = [600] * len(flat) # ํ”„๋ ˆ์ž„๋‹น 0.6์ดˆ
if durations:
durations[-1] = 1500 # ๋งˆ์ง€๋ง‰ ํ”„๋ ˆ์ž„์€ 1.5์ดˆ ์ •์ง€
flat[0].save("runs/me.gif", save_all=True, append_images=flat[1:],
duration=durations, loop=0, disposal=2, optimize=True)
print("wrote runs/me.gif")
PY
# 3) ์—ด์–ด ๋ณด๊ธฐ (macOS)
open runs/me.gif
```
> **ffmpeg ๋ฒ„์ „**(๋” ๋ถ€๋“œ๋Ÿฌ์šด ๊ณ ํ™”์งˆ GIF, 2-pass ํŒ”๋ ˆํŠธ):
> ```bash
> ffmpeg -y -framerate 2 -i runs/me_frames/frame_%03d.png \
> -vf "palettegen" runs/palette.png
> ffmpeg -y -framerate 2 -i runs/me_frames/frame_%03d.png -i runs/palette.png \
> -lavfi "paletteuse" runs/me.gif
> ```
---
## 6. ์—ฌ๋Ÿฌ ๊ธฐ๋ก ๋ชจ์•„ ๋น„๊ต โ€” `compare`
์‚ฌ๋žŒยทLLM ๊ธฐ๋ก์„ `(๋ชจ๋ธ, ๋‚œ์ด๋„)`๋ณ„๋กœ ๋ฌถ์–ด ๋ฉ”ํŠธ๋ฆญ ํ‰๊ท ๊ณผ ๊ฐœ์ˆ˜(n)๋ฅผ ๋ƒ…๋‹ˆ๋‹ค. ํœด๋จผ ๋ฒ ์ด์Šค๋ผ์ธ ๋น„๊ต์šฉ์ž…๋‹ˆ๋‹ค.
```bash
.venv/bin/python -m proteus compare runs/me.jsonl runs/llm.jsonl --out runs/summary.json
```
- ์—ฌ๋Ÿฌ JSONL ํŒŒ์ผ์„ ํ•œ ๋ฒˆ์— ๋„ฃ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
- `--out`์„ ์ฃผ๋ฉด ์ง‘๊ณ„ ๊ฒฐ๊ณผ๋ฅผ JSON์œผ๋กœ๋„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
- ์ถœ๋ ฅ ํ‚ค์˜ `๋ชจ๋ธ`์€ `--model`์˜ **์ฝœ๋ก  ๋’ค** ๋ถ€๋ถ„์ž…๋‹ˆ๋‹ค(์˜ˆ: `fake:demo` โ†’ `demo`).
> **๊ณต์ •ํ•œ ๋น„๊ต ํŒ**: ์‚ฌ๋žŒ(`play --out`)๊ณผ LLM(`run --out`)์„ **๊ฐ™์€ `--seed`ยท`--difficulty`** ๋กœ ๋Œ๋ฆฐ ๋’ค ๋น„๊ตํ•˜์„ธ์š”.
---
## 7. ์ž์ฃผ ์“ฐ๋Š” ํ๋ฆ„ ํ•œ๋ˆˆ์—
```bash
# (a) ์‚ฌ๋žŒ์œผ๋กœ ํ•œ ํŒ + ์ €์žฅ
.venv/bin/python -m proteus play --scenario predator_evade --difficulty easy --seed 42 --play-turns 10 --out runs/me.jsonl
# (b) ํŠธ๋ฃจ์ปฌ๋Ÿฌ๋กœ ๋‹ค์‹œ ๋ณด๊ธฐ
.venv/bin/python -m proteus replay runs/me.jsonl --visual --fps 0
# (c) GIF๋กœ ๋งŒ๋“ค๊ธฐ
.venv/bin/python -m proteus replay runs/me.jsonl --png runs/me_frames
# โ†’ 5์ ˆ์˜ Pillow ์Šค๋‹ˆํŽซ ์‹คํ–‰ โ†’ runs/me.gif
# (d) LLM(๊ฐ€์งœ)๋กœ ๊ฐ™์€ ์กฐ๊ฑด ํ•œ ํŒ
.venv/bin/python -m proteus run --scenario predator_evade --model fake:demo --difficulty easy --seed 42 --play-turns 10 --out runs/llm.jsonl
# (e) ์‚ฌ๋žŒ vs LLM ๋น„๊ต
.venv/bin/python -m proteus compare runs/me.jsonl runs/llm.jsonl --out runs/summary.json
```
---
## 8. ์ข…๋ฃŒ ์ฝ”๋“œ(์—๋Ÿฌ ์ฒ˜๋ฆฌ)
| ์ฝ”๋“œ | ์˜๋ฏธ |
|------|------|
| `0` | ์ •์ƒ |
| `1` | ์ž…๋ ฅ์€ ์ฐพ์•˜์œผ๋‚˜ ๋น„์–ด ์žˆ์Œ(์˜ˆ: `replay`/`compare`์— ๋นˆ trace) |
| `2` | ์ž˜๋ชป๋œ/์—†๋Š” ์ธ์ž(์—†๋Š” ๋ชจ๋ธยท์‹œ๋‚˜๋ฆฌ์˜ค, ํŒŒ์ผ ์—†์Œ, `play` ์ค‘ stdin ์กฐ๊ธฐ ์ข…๋ฃŒ) |
---
## 9. ์‹ค์ œ ๋ชจ๋ธ๋กœ ์‹คํ–‰ (์„ ํƒ)
`.venv`์—๋Š” provider SDK๊ฐ€ ์—†์œผ๋ฏ€๋กœ, ์‹ค์ œ LLM์„ ๋Œ๋ฆด ๋•Œ๋Š” **์ž„์‹œ venv**๋ฅผ ๋”ฐ๋กœ ๋งŒ๋“ค์–ด ์”๋‹ˆ๋‹ค(์˜คํ”„๋ผ์ธ ๋ถˆ๋ณ€ ์œ ์ง€).
```bash
# ์˜ˆ: Ollama Cloud
python3 -m venv /tmp/proteus-real && \
/tmp/proteus-real/bin/pip install pydantic numpy pyyaml httpx && \
PYTHONPATH="$PWD" OLLAMA_API_KEY="<ํ‚ค>" \
/tmp/proteus-real/bin/python -m proteus run \
--scenario predator_evade --model ollama:gpt-oss:120b-cloud \
--difficulty easy --seed 42 --play-turns 10 --out runs/real.jsonl
```
- ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ provider ์ด๋ฆ„์€ `run --help`์˜ `--model` ์„ค๋ช…์— ๋‚˜์˜ต๋‹ˆ๋‹ค.
- API ํ‚ค์™€ `runs/`๋Š” `.gitignore` ๋Œ€์ƒ์ด๋ผ ์ปค๋ฐ‹๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
---
## 10. ๋„์›€๋ง์€ ์–ธ์ œ๋“ 
```bash
.venv/bin/python -m proteus --help # ์ „์ฒด ๋ช…๋ น
.venv/bin/python -m proteus run --help # ๋ช…๋ น๋ณ„ ์˜ต์…˜
.venv/bin/python -m proteus play --help
.venv/bin/python -m proteus replay --help
.venv/bin/python -m proteus compare --help
```
๊ถ๊ธˆํ•œ ์ ์ด ์žˆ์œผ๋ฉด `docs/superpowers/specs/`์˜ ์„ค๊ณ„ ๋ฌธ์„œ์™€ `HANDOFF.md`๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.