File size: 4,398 Bytes
da3fe02
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
720b9ad
 
da3fe02
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
720b9ad
 
 
 
da3fe02
 
 
 
 
 
 
 
 
 
 
 
720b9ad
 
 
da3fe02
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
# Labeling Pipeline (CLIP Text Labels)

Data based on NEXON Open API.

## Overview
This pipeline generates CLIP-ready text labels for MapleStory item icons using Qwen2-VL.
It consumes either a manifest file or the SQLite DB and writes:
- `labels.jsonl` (one JSON record per image)
- `labels.parquet` (optional)

## Requirements
- Python 3.11+
- GPU recommended for Qwen2-VL inference

## Install
```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

Optional (for 4-bit quantization):
```bash
pip install bitsandbytes
```

## Input Adapters
You can use one of the following:

A) Manifest (recommended)
- `data/<DATE>/manifest.parquet` or `manifest.jsonl`
- Required columns: `image_path`, `item_name`, `source_type`

B) SQLite DB
- `data/<DATE>/db.sqlite`
- Joins `equipment_shape_items` / `cash_items` with `icon_assets`

## Run
```bash
python -m labeler run \
  --input data/2026-01-10/manifest.parquet \
  --outdir data/2026-01-10/labels \
  --model Qwen/Qwen2-VL-2B-Instruct \
  --device auto \
  --batch-size 8 \
  --upscale 2 \
  --resume
```

Using DB input:
```bash
python -m labeler run \
  --db data/2026-01-10/db.sqlite \
  --outdir data/2026-01-10/labels \
  --quality-retry
```

Range filter by run_id (optional):
```bash
python -m labeler run --db data/2026-01-10/db.sqlite --run-id <RUN_ID>
```

## Output Schema
Each line in `labels.jsonl` is a JSON object:

```json
{
  "image_path": "...",
  "image_sha256": "...",
  "source_type": "equipment_shape" | "cash",
  "item_name": "...",
  "item_description": "...",
  "label_ko": "...",
  "label_en": "...",
  "tags_ko": ["..."],
  "attributes": {
    "colors": ["..."],
    "theme": ["..."],
    "material": ["..."],
    "vibe": ["..."],
    "item_type_guess": "..."
  },
  "query_variants_ko": ["..."],
  "quality_flags": {
    "is_uncertain": true,
    "reasons": ["too_small", "ambiguous_icon"]
  },
  "model": "Qwen/Qwen2-VL-2B-Instruct",
  "prompt_version": "v1",
  "generated_at": "ISO-8601"
}
```

## Prompt Versioning
- Prompt version is stored as `prompt_version` in each record.
- Current version: `v2` (see `src/labeler/prompts.py`).

## Quality
- For higher-quality labels (more visual descriptors), use `--quality-retry`.

## Resume / Idempotency
- If `labels.jsonl` already exists, use `--resume`.
- The pipeline skips images already labeled by `image_path` or `image_sha256`.

## Comparisons
You can compare modes:
- `--no-image` (metadata only)
- `--no-metadata` (image only)

## Example Output (3 lines)
```json
{"image_path":"icons/equipment_shape/abc.png","image_sha256":"sha...","source_type":"equipment_shape","item_name":"Sample Hat","item_description":null,"label_ko":"μƒ˜ν”Œ λͺ¨μž μ•„μ΄μ½˜, 뢉은 색감","label_en":null,"tags_ko":["λͺ¨μž","뢉은","μ•„μ΄μ½˜","μž₯λΉ„","캐릭터"],"attributes":{"colors":["red"],"theme":["fantasy"],"material":["cloth"],"vibe":["cute"],"item_type_guess":"hat"},"query_variants_ko":["μƒ˜ν”Œ λͺ¨μž","뢉은 λͺ¨μž μ•„μ΄μ½˜","λ©”μ΄ν”Œ λͺ¨μž"],"quality_flags":{"is_uncertain":false,"reasons":[]},"model":"Qwen/Qwen2-VL-2B-Instruct","prompt_version":"v2","generated_at":"2026-01-10T00:00:00Z"}
{"image_path":"icons/cash/def.png","image_sha256":"sha...","source_type":"cash","item_name":"Sample Cape","item_description":"Example","label_ko":"μƒ˜ν”Œ 망토 μ•„μ΄μ½˜, ν‘Έλ₯Έ 계열","label_en":null,"tags_ko":["망토","ν‘Έλ₯Έ","μ½”λ””","μΊμ‹œ","μ•„μ΄μ½˜"],"attributes":{"colors":["blue"],"theme":["classic"],"material":["silk"],"vibe":["elegant"],"item_type_guess":"cape"},"query_variants_ko":["ν‘Έλ₯Έ 망토","μƒ˜ν”Œ 망토 μ•„μ΄μ½˜","λ©”μ΄ν”Œ μΊμ‹œ 망토"],"quality_flags":{"is_uncertain":false,"reasons":[]},"model":"Qwen/Qwen2-VL-2B-Instruct","prompt_version":"v2","generated_at":"2026-01-10T00:00:00Z"}
{"image_path":"icons/equipment_shape/ghi.png","image_sha256":"sha...","source_type":"equipment_shape","item_name":"Sample Sword","item_description":null,"label_ko":"μƒ˜ν”Œ κ²€ μ•„μ΄μ½˜, κΈˆμ† λŠλ‚Œ","label_en":null,"tags_ko":["κ²€","무기","κΈˆμ†","μ•„μ΄μ½˜","μž₯λΉ„"],"attributes":{"colors":["silver"],"theme":["fantasy"],"material":["metal"],"vibe":["sharp"],"item_type_guess":"sword"},"query_variants_ko":["μƒ˜ν”Œ κ²€","λ©”μ΄ν”Œ κ²€ μ•„μ΄μ½˜","κΈˆμ† κ²€"],"quality_flags":{"is_uncertain":false,"reasons":[]},"model":"Qwen/Qwen2-VL-2B-Instruct","prompt_version":"v2","generated_at":"2026-01-10T00:00:00Z"}
```