jonghhhh Claude Sonnet 4.5 commited on
Commit
90aed8a
ยท
1 Parent(s): afd99b0

Enhance README.md with comprehensive documentation

Browse files

- Add system architecture Mermaid diagram showing 4-step RAG process
- Include detailed data structure breakdown (166,721 respondents)
- Add complete API documentation with curl examples
- Provide response examples for Actual and Synthetic modes
- Add 6 research application categories
- Include performance metrics and technical stack details
- Add professional badges and enhanced formatting

Based on NBS_System_Guide.md technical documentation.

๐Ÿค– Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Files changed (1) hide show
  1. README.md +340 -37
README.md CHANGED
@@ -5,63 +5,366 @@ colorFrom: blue
5
  colorTo: purple
6
  sdk: docker
7
  pinned: false
 
 
 
 
 
 
 
8
  ---
9
 
10
- # NBS ํŽ˜๋ฅด์†Œ๋‚˜ ์„ค๋ฌธ์กฐ์‚ฌ ์‹œ์Šคํ…œ
11
 
12
- ์ „๊ตญ์ง€ํ‘œ์กฐ์‚ฌ(NBS) ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ํŽ˜๋ฅด์†Œ๋‚˜ ์•„๋ฐ”ํƒ€ ์„ค๋ฌธ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ API
 
 
 
13
 
14
- ## ๊ธฐ๋Šฅ
15
 
16
- - **์‹ค์ œ ์‘๋‹ต์ž ๋ชจ๋“œ**: 16๋งŒ ๊ฑด์˜ ์‹ค์ œ ์‘๋‹ต์ž ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜
17
- - **ํ†ต๊ณ„ ๊ธฐ๋ฐ˜ ๋ชจ๋“œ**: ํŠน์ • ๊ทธ๋ฃน์˜ ํ†ต๊ณ„์  ๊ฒฝํ–ฅ์„ฑ์„ ๋ฐ˜์˜ํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜
18
- - **RAG ๊ธฐ๋ฐ˜ ์‘๋‹ต ์ƒ์„ฑ**: ๊ณผ๊ฑฐ ์‘๋‹ต ์ด๋ ฅ์„ ์ฐธ์กฐํ•˜์—ฌ ์ผ๊ด€์„ฑ ์žˆ๋Š” ๋‹ต๋ณ€ ์ƒ์„ฑ
19
 
20
- ## ๋ฐ์ดํ„ฐ
 
 
 
 
21
 
22
- - ์ด ์‘๋‹ต์ž: 166,721๋ช…
23
- - ์„ค๋ฌธ ํšŒ์ฐจ: 163ํšŒ (2020๋…„~ํ˜„์žฌ)
24
- - ๊ณ ์œ  ์งˆ๋ฌธ: 1,219๊ฐœ
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
- ## API ์‚ฌ์šฉ๋ฒ•
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
  ### Health Check
 
29
  ```bash
30
- GET /health
31
  ```
32
 
33
- ### ์‹ค์ œ ์‘๋‹ต์ž ์‹œ๋ฎฌ๋ ˆ์ด์…˜
34
- ```bash
35
- POST /simulate/actual
36
  {
37
- "question": "์ •๋…„์—ฐ์žฅ์— ๋Œ€ํ•ด ์–ด๋–ป๊ฒŒ ์ƒ๊ฐํ•˜์‹ญ๋‹ˆ๊นŒ?",
38
- "gender": "๋‚จ์ž",
39
- "age": "31~40",
40
- "region": "์„œ์šธ",
41
- "job": "์‚ฌ๋ฌด/๊ธฐ์ˆ ์ง",
42
- "sample": 5
43
  }
44
  ```
45
 
46
- ### ํ†ต๊ณ„ ๊ธฐ๋ฐ˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜
 
 
 
 
 
47
  ```bash
48
- POST /simulate/synthetic
49
- {
50
- "question": "๊ธฐ๋ณธ์†Œ๋“์ œ ๋„์ž…์— ๋Œ€ํ•œ ์˜๊ฒฌ์€?",
51
- "gender": "์—ฌ์ž",
52
- "age": "20~29",
53
- "region": "๊ฒฝ๊ธฐ",
54
- "sample": 3
55
- }
 
 
 
56
  ```
57
 
58
- ## ๊ธฐ์ˆ  ์Šคํƒ
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
 
60
- - FastAPI
61
- - Sentence Transformers (KR-SBERT)
62
- - Google Gemini 2.5 Flash
63
- - Pandas + PyArrow (Parquet)
64
 
65
- ## ๋ผ์ด์„ ์Šค
66
 
67
- ๋ฐ์ดํ„ฐ๋Š” ์ „๊ตญ์ง€ํ‘œ์กฐ์‚ฌ(NBS) ์›๋ณธ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค.
 
5
  colorTo: purple
6
  sdk: docker
7
  pinned: false
8
+ license: apache-2.0
9
+ tags:
10
+ - computational-social-science
11
+ - survey-simulation
12
+ - rag
13
+ - persona
14
+ - korean
15
  ---
16
 
17
+ # ๐ŸŽญ NBS ํŽ˜๋ฅด์†Œ๋‚˜ ์„ค๋ฌธ์กฐ์‚ฌ ์‹œ์Šคํ…œ
18
 
19
+ [![Hugging Face](https://img.shields.io/badge/๐Ÿค—%20Hugging%20Face-Space-blue)](https://huggingface.co/spaces/jonghhhh/persona_survey)
20
+ [![Python](https://img.shields.io/badge/Python-3.10+-blue.svg)](https://www.python.org/downloads/)
21
+ [![FastAPI](https://img.shields.io/badge/FastAPI-0.128.0-009688.svg)](https://fastapi.tiangolo.com/)
22
+ [![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)](https://opensource.org/licenses/Apache-2.0)
23
 
24
+ ์ „๊ตญ์ง€ํ‘œ์กฐ์‚ฌ(NBS) ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ **ํŽ˜๋ฅด์†Œ๋‚˜ ์•„๋ฐ”ํƒ€ ์„ค๋ฌธ ์‹œ๋ฎฌ๋ ˆ์ด์…˜** API์ž…๋‹ˆ๋‹ค. ๊ณผ๊ฑฐ ์—ฌ๋ก ์กฐ์‚ฌ์— ์ฐธ์—ฌํ–ˆ๋˜ ์‹ค์ œ ์‚ฌ๋žŒ๋“ค์„ ๋””์ง€ํ„ธ ์•„๋ฐ”ํƒ€๋กœ ์žฌํ˜„ํ•˜์—ฌ, ์ƒˆ๋กœ์šด ์‚ฌํšŒ์  ์ด์Šˆ์— ๋Œ€ํ•œ ๋ฐ˜์‘์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
25
 
26
+ ## ๐ŸŒŸ ์ฃผ์š” ํŠน์ง•
 
 
27
 
28
+ - **๐Ÿ“Š ๋Œ€๊ทœ๋ชจ ์‹ค์ฆ ๋ฐ์ดํ„ฐ**: 166,721๋ช…์˜ ์‹ค์ œ ์‘๋‹ต์ž, 163ํšŒ ์„ค๋ฌธ์กฐ์‚ฌ, 1,219๊ฐœ ๊ณ ์œ  ์งˆ๋ฌธ
29
+ - **๐Ÿค– RAG ๊ธฐ๋ฐ˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜**: ๊ณผ๊ฑฐ ์‘๋‹ต ์ด๋ ฅ์„ ๊ฒ€์ƒ‰ํ•˜์—ฌ LLM ์ปจํ…์ŠคํŠธ๋กœ ํ™œ์šฉ
30
+ - **๐ŸŽฏ ์ •๋ฐ€ ํƒ€๊ฒŸํŒ…**: ์„ฑ๋ณ„, ์—ฐ๋ น, ์ง€์—ญ, ์ง์—…๋ณ„ ์„ธ๋ถ„ํ™”๋œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜
31
+ - **โšก ๊ฒฝ๋Ÿ‰ ์‹œ์Šคํ…œ**: 13MB Parquet ํŒŒ์ผ๋กœ 16๋งŒ+ ์‘๋‹ต ๋ฐ์ดํ„ฐ ์••์ถ•
32
+ - **๐Ÿ”„ ๋‘ ๊ฐ€์ง€ ๋ชจ๋“œ**: ๊ฐœ๋ณ„ ์‘๋‹ต์ž(Actual) vs ํ†ต๊ณ„ ์ง‘๊ณ„(Synthetic)
33
 
34
+ ---
35
+
36
+ ## ๐Ÿ“š ๋ชฉ์ฐจ
37
+
38
+ - [์ž‘๋™ ์›๋ฆฌ](#-์ž‘๋™-์›๋ฆฌ)
39
+ - [๋ฐ์ดํ„ฐ ๊ตฌ์กฐ](#-๋ฐ์ดํ„ฐ-๊ตฌ์กฐ)
40
+ - [API ์‚ฌ์šฉ๋ฒ•](#-api-์‚ฌ์šฉ๋ฒ•)
41
+ - [์‘๋‹ต ์˜ˆ์‹œ](#-์‘๋‹ต-์˜ˆ์‹œ)
42
+ - [๊ธฐ์ˆ  ์Šคํƒ](#-๊ธฐ์ˆ -์Šคํƒ)
43
+ - [์—ฐ๊ตฌ ํ™œ์šฉ](#-์—ฐ๊ตฌ-ํ™œ์šฉ)
44
+ - [๋ผ์ด์„ ์Šค](#-๋ผ์ด์„ ์Šค)
45
+
46
+ ---
47
+
48
+ ## ๐Ÿง  ์ž‘๋™ ์›๋ฆฌ
49
+
50
+ ### 1. ์‹œ์Šคํ…œ ์•„ํ‚คํ…์ฒ˜
51
+
52
+ ```mermaid
53
+ graph TB
54
+ A[์‚ฌ์šฉ์ž ์งˆ๋ฌธ + ์กฐ๊ฑด] --> B{1. ์ธ๊ตฌํ†ต๊ณ„ ํ•„ํ„ฐ๋ง}
55
+ B --> C[์กฐ๊ฑด ๋งž๋Š” ์‘๋‹ต์ž ์ถ”์ถœ<br/>166,721๋ช… ์ค‘]
56
+
57
+ A --> D{2. ์˜๋ฏธ ๊ฒ€์ƒ‰}
58
+ D --> E[์œ ์‚ฌ ์งˆ๋ฌธ Top-5<br/>1,219๊ฐœ ์ค‘]
59
+
60
+ C --> F{3. ์ปจํ…์ŠคํŠธ ๊ตฌ์„ฑ}
61
+ E --> F
62
+ F --> G[์‘๋‹ต์ž์˜ ๊ณผ๊ฑฐ ๋‹ต๋ณ€<br/>+ ๊ฐ€์น˜๊ด€ ํ”„๋กœํ•„]
63
+
64
+ G --> H[4. LLM ์ƒ์„ฑ]
65
+ H --> I[์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋‹ต๋ณ€]
66
+ ```
67
+
68
+ ### 2. ํ•ต์‹ฌ ๋ฐ์ดํ„ฐ ํŒŒ์ผ
69
+
70
+ | ํŒŒ์ผ | ํฌ๊ธฐ | ์—ญํ•  | ๊ตฌ์กฐ |
71
+ |------|------|------|------|
72
+ | `nbs_questions_index.parquet` | 6.2MB | ์งˆ๋ฌธ ๊ฒ€์ƒ‰ ์ธ๋ฑ์Šค | 1,219๊ฐœ ์งˆ๋ฌธ ร— 768์ฐจ์› ๋ฒกํ„ฐ |
73
+ | `consolidated_nbs_data.parquet` | 6.7MB | ์‘๋‹ต์ž ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค | 166,721๋ช… ร— 1,219๊ฐœ ์งˆ๋ฌธ (ํฌ์†Œ ํ–‰๋ ฌ) |
74
+
75
+ **์ž‘๋™ ํ๋ฆ„:**
76
+ 1. **์งˆ๋ฌธ ๊ฒ€์ƒ‰**: `nbs_questions_index.parquet`์—์„œ KR-SBERT ์ž„๋ฒ ๋”ฉ์œผ๋กœ ์œ ์‚ฌ ์งˆ๋ฌธ ๊ฒ€์ƒ‰
77
+ 2. **์‘๋‹ต์ž ํ•„ํ„ฐ๋ง**: `consolidated_nbs_data.parquet`์—์„œ ์กฐ๊ฑด ๋งž๋Š” ์‚ฌ๋žŒ ์ถ”์ถœ
78
+ 3. **์ปจํ…์ŠคํŠธ ์ƒ์„ฑ**: ์„ ํƒ๋œ ์‘๋‹ต์ž์˜ ๊ณผ๊ฑฐ ๋‹ต๋ณ€ + ์ •์น˜์„ฑํ–ฅ/๊ฐ€์น˜๊ด€ ๋ฐ์ดํ„ฐ
79
+ 4. **LLM ์‹œ๋ฎฌ๋ ˆ์ด์…˜**: Gemini 2.5 Flash๋กœ ์ผ๊ด€์„ฑ ์žˆ๋Š” ๋‹ต๋ณ€ ์ƒ์„ฑ
80
+
81
+ ### 3. RAG (Retrieval-Augmented Generation)
82
+
83
+ LLM ํ™˜๊ฐ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด **์‹ค์ œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ทผ๊ฑฐ๋กœ ๋‹ต๋ณ€์„ ์ƒ์„ฑ**ํ•ฉ๋‹ˆ๋‹ค:
84
+
85
+ ```python
86
+ # ์˜ˆ์‹œ: "์ •๋…„์—ฐ์žฅ" ์งˆ๋ฌธ์— ๋Œ€ํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜
87
+ 1. ์œ ์‚ฌ ์งˆ๋ฌธ ๊ฒ€์ƒ‰ โ†’ "๊ณ ๋ น์ž ๊ณ ์šฉ ์ •์ฑ…", "์ •๋…„ํ‡ด์ง ์—ฐ๋ น", "๋…ธํ›„ ์†Œ๋“ ๋ณด์žฅ"
88
+ 2. ์„œ์šธ 40๋Œ€ ๋‚จ์„ฑ ์ž์˜์—…์ž ์‘๋‹ต์ž 5๋ช… ์ถ”์ถœ
89
+ 3. ๊ฐ ์‘๋‹ต์ž์˜ ๊ณผ๊ฑฐ ๋‹ต๋ณ€:
90
+ - ์‘๋‹ต์ž A: "๋ณต์ง€ ์ •์ฑ… ํ™•๋Œ€ โ†’ ๋ฐ˜๋Œ€ (์„ธ๊ธˆ ๋ถ€๋‹ด)", "์ •์น˜ ์„ฑํ–ฅ โ†’ ๋ณด์ˆ˜"
91
+ - ์‘๋‹ต์ž B: "๋ณต์ง€ ์ •์ฑ… ํ™•๋Œ€ โ†’ ์ฐฌ์„ฑ (์„œ๋ฏผ ๋ณดํ˜ธ)", "์ •์น˜ ์„ฑํ–ฅ โ†’ ์ง„๋ณด"
92
+ 4. LLM์ด ์œ„ ์ •๋ณด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๊ฐ์ž์˜ ์ž…์žฅ์—์„œ ์ •๋…„์—ฐ์žฅ ๋‹ต๋ณ€ ์ƒ์„ฑ
93
+ ```
94
+
95
+ ---
96
+
97
+ ## ๐Ÿ“‚ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ
98
+
99
+ ### ์ธ๊ตฌํ†ต๊ณ„ ๋ถ„ํฌ
100
+
101
+ ```
102
+ ์ด ์‘๋‹ต์ž: 166,721๋ช… (2020~2025๋…„)
103
+ ์„ค๋ฌธ ํšŒ์ฐจ: 163ํšŒ
104
+ ๊ณ ์œ  ์งˆ๋ฌธ: 1,219๊ฐœ
105
+
106
+ ์„ฑ๋ณ„:
107
+ ๋‚จ์ž: 50.1% (83,561๋ช…)
108
+ ์—ฌ์ž: 49.9% (83,160๋ช…)
109
+
110
+ ์—ฐ๋ น:
111
+ ํ‰๊ท : 50.0์„ธ (ํ‘œ์ค€ํŽธ์ฐจ 17.4์„ธ)
112
+ ๋ฒ”์œ„: 18์„ธ ~ 99์„ธ
113
 
114
+ ์ง€์—ญ (Top 5):
115
+ ๊ฒฝ๊ธฐ: 25.5% (42,513๋ช…)
116
+ ์„œ์šธ: 18.5% (30,843๋ช…)
117
+ ๋ถ€์‚ฐ: 6.9% (11,504๋ช…)
118
+ ๊ฒฝ๋‚จ: 6.5% (10,837๋ช…)
119
+ ์ธ์ฒœ: 5.7% (9,503๋ช…)
120
+
121
+ ์ง์—… (Top 4):
122
+ ์‚ฌ๋ฌด/๊ธฐ์ˆ ์ง: 25.2% (42,013๋ช…)
123
+ ์ฃผ๋ถ€: 18.2% (30,343๋ช…)
124
+ ์ž์˜์—…: 15.7% (26,175๋ช…)
125
+ ๋ฌด์ง/ํ‡ด์ง: 14.7% (24,508๋ช…)
126
+ ```
127
+
128
+ ### Sparse Matrix ๊ตฌ์กฐ
129
+
130
+ ๊ฐ ์‘๋‹ต์ž๋Š” ์ „์ฒด 1,219๊ฐœ ์งˆ๋ฌธ ์ค‘ **ํ‰๊ท  20~30๊ฐœ๋งŒ ๋‹ต๋ณ€**ํ–ˆ์œผ๋ฉฐ, Parquet์˜ ํšจ์œจ์  ์••์ถ•์œผ๋กœ 13MB์— ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
131
+
132
+ ```
133
+ ์‘๋‹ต์ž #12345:
134
+ - survey_round: 142
135
+ - gender: ๋‚จ์ž
136
+ - age: 45
137
+ - region: ์„œ์šธ
138
+ - job: ์ž์˜์—…
139
+ - ๋Œ€ํ†ต๋ น_๊ตญ์ •์šด์˜_ํ‰๊ฐ€: "๊ธ์ •" โœ… ๋‹ต๋ณ€ ์žˆ์Œ
140
+ - ์ •๋…„์—ฐ์žฅ_์ฐฌ๋ฐ˜: "์ฐฌ์„ฑ" โœ… ๋‹ต๋ณ€ ์žˆ์Œ
141
+ - ๊ธฐ๋ณธ์†Œ๋“์ œ_์ฐฌ๋ฐ˜: NaN โŒ ๋ฏธ์ฐธ์—ฌ
142
+ - ... (1,217๊ฐœ ์งˆ๋ฌธ)
143
+ ```
144
+
145
+ ---
146
+
147
+ ## ๐Ÿš€ API ์‚ฌ์šฉ๋ฒ•
148
 
149
  ### Health Check
150
+
151
  ```bash
152
+ curl https://jonghhhh-persona-survey.hf.space/health
153
  ```
154
 
155
+ **์‘๋‹ต:**
156
+ ```json
 
157
  {
158
+ "status": "healthy",
159
+ "total_records": 166721
 
 
 
 
160
  }
161
  ```
162
 
163
+ ---
164
+
165
+ ### ์‹ค์ œ ์‘๋‹ต์ž ์‹œ๋ฎฌ๋ ˆ์ด์…˜ (Actual Mode)
166
+
167
+ **๊ฐœ๋ณ„ ์‘๋‹ต์ž**์˜ ๊ตฌ์ฒด์ ์ธ ๊ณผ๊ฑฐ ๋‹ต๋ณ€์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•ฉ๋‹ˆ๋‹ค.
168
+
169
  ```bash
170
+ curl -X POST "https://jonghhhh-persona-survey.hf.space/simulate/actual" \
171
+ -H "Content-Type: application/json" \
172
+ -H "X-API-Key: YOUR_GEMINI_API_KEY" \
173
+ -d '{
174
+ "question": "์ •๋…„์—ฐ์žฅ์— ๋Œ€ํ•ด ์–ด๋–ป๊ฒŒ ์ƒ๊ฐํ•˜์‹ญ๋‹ˆ๊นŒ?",
175
+ "gender": "๋‚จ์ž",
176
+ "age": "40",
177
+ "region": "์„œ์šธ",
178
+ "job": "์‚ฌ๋ฌด/๊ธฐ์ˆ ์ง",
179
+ "sample": 3
180
+ }'
181
  ```
182
 
183
+ **ํŒŒ๋ผ๋ฏธํ„ฐ:**
184
+ - `question` (ํ•„์ˆ˜): ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•  ์งˆ๋ฌธ
185
+ - `gender` (์„ ํƒ): ์„ฑ๋ณ„ - "๋‚จ์ž" ๋˜๋Š” "์—ฌ์ž"
186
+ - `age` (์„ ํƒ): ์—ฐ๋ น - "40" ๋˜๋Š” "30~40" ํ˜•์‹
187
+ - `region` (์„ ํƒ): ์ง€์—ญ - ์„œ์šธ, ๊ฒฝ๊ธฐ, ๋ถ€์‚ฐ, ๋Œ€๊ตฌ, ๊ด‘์ฃผ, ๋Œ€์ „, ์šธ์‚ฐ, ์„ธ์ข…, ๊ฐ•์›, ์ถฉ๋ถ, ์ถฉ๋‚จ, ์ „๋ถ, ์ „๋‚จ, ๊ฒฝ๋ถ, ๊ฒฝ๋‚จ, ์ œ์ฃผ
188
+ - `job` (์„ ํƒ): ์ง์—… - ํ•™์ƒ, ์‚ฌ๋ฌด/๊ธฐ์ˆ ์ง, ์ž์˜์—…, ์ฃผ๋ถ€, ๊ฒฝ์˜/๊ด€๋ฆฌ/์ „๋ฌธ์ง, ์ƒ์‚ฐ/๊ธฐ๋Šฅ/๋…ธ๋ฌด์ง, ๋†/๋ฆผ/์ˆ˜์‚ฐ์—…, ๋ฌด์ง/ํ‡ด์ง/๊ธฐํƒ€
189
+ - `sample` (ํ•„์ˆ˜): ์ƒ์„ฑํ•  ์•„๋ฐ”ํƒ€ ์ˆ˜ (๊ถŒ์žฅ: 3~10๊ฐœ)
190
+
191
+ **ํ—ค๋”:**
192
+ - `X-API-Key`: Gemini API ํ‚ค (ํ•„์ˆ˜)
193
+
194
+ ---
195
+
196
+ ### ํ†ต๊ณ„ ๊ธฐ๋ฐ˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ (Synthetic Mode)
197
+
198
+ **๊ทธ๋ฃน ์ „์ฒด**์˜ ํ†ต๊ณ„์  ๊ฒฝํ–ฅ์„ฑ์„ ๋ฐ˜์˜ํ•œ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์ž…๋‹ˆ๋‹ค.
199
+
200
+ ```bash
201
+ curl -X POST "https://jonghhhh-persona-survey.hf.space/simulate/synthetic" \
202
+ -H "Content-Type: application/json" \
203
+ -H "X-API-Key: YOUR_GEMINI_API_KEY" \
204
+ -d '{
205
+ "question": "๊ธฐ๋ณธ์†Œ๋“์ œ ๋„์ž…์— ๋Œ€ํ•œ ์˜๊ฒฌ์€?",
206
+ "gender": "์—ฌ์ž",
207
+ "age": "20~29",
208
+ "region": "๊ฒฝ๊ธฐ",
209
+ "sample": 3
210
+ }'
211
+ ```
212
+
213
+ **์ฐจ์ด์ :**
214
+ - **Actual**: ๊ฐ ์‘๋‹ต์ž์˜ ๊ฐœ๋ณ„ ์ด๋ ฅ โ†’ ๋‹ค์–‘ํ•œ ๋‹ต๋ณ€
215
+ - **Synthetic**: ๊ทธ๋ฃน์˜ ๋‹ต๋ณ€ ๋ถ„ํฌ ํ†ต๊ณ„ โ†’ ๋Œ€ํ‘œ์„ฑ ์žˆ๋Š” ๋‹ต๋ณ€
216
+
217
+ ---
218
+
219
+ ## ๐Ÿ“‹ ์‘๋‹ต ์˜ˆ์‹œ
220
+
221
+ ### Actual Mode ์‘๋‹ต
222
+
223
+ ```json
224
+ [
225
+ {
226
+ "respondent_id": "12345",
227
+ "survey_round": 142,
228
+ "demographics": {
229
+ "gender": "๋‚จ์ž",
230
+ "region": "์„œ์šธ",
231
+ "age": 45,
232
+ "job": "์‚ฌ๋ฌด/๊ธฐ์ˆ ์ง"
233
+ },
234
+ "referenced_context": [
235
+ "๋ณต์ง€ ์ •์ฑ… ํ™•๋Œ€ โ†’ ๋ฐ˜๋Œ€ (์„ธ๊ธˆ ๋ถ€๋‹ด ์šฐ๋ ค)",
236
+ "์ •์น˜ ์„ฑํ–ฅ โ†’ ๋ณด์ˆ˜",
237
+ "๊ฒฝ์ œ ์„ฑ์žฅ ์šฐ์„  โ†’ ์ฐฌ์„ฑ"
238
+ ],
239
+ "response": "์ •๋…„์—ฐ์žฅ์— ๋Œ€ํ•ด์„œ๋Š” ์‹ ์ค‘ํ•œ ์ ‘๊ทผ์ด ํ•„์š”ํ•˜๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์—…์˜ ์ธ๊ฑด๋น„ ๋ถ€๋‹ด์ด ๋Š˜์–ด๋‚˜๋ฉด ์ฒญ๋…„ ์ผ์ž๋ฆฌ๊ฐ€ ์ค„์–ด๋“ค ์ˆ˜ ์žˆ๊ณ , ๊ฒฐ๊ตญ ์„ธ๊ธˆ์œผ๋กœ ๋ณด์ „ํ•ด์•ผ ํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๊ฐœ์ธ์˜ ๋…ธํ›„ ์ค€๋น„๋ฅผ ์ง€์›ํ•˜๋Š” ๋ฐฉํ–ฅ์ด ๋” ๋ฐ”๋žŒ์งํ•˜๋‹ค๊ณ  ๋ด…๋‹ˆ๋‹ค."
240
+ }
241
+ ]
242
+ ```
243
+
244
+ ### Synthetic Mode ์‘๋‹ต
245
+
246
+ ```json
247
+ [
248
+ {
249
+ "avatar_id": "syn_0",
250
+ "demographics": {
251
+ "gender": "์—ฌ์ž",
252
+ "age": "20~29",
253
+ "region": "๊ฒฝ๊ธฐ",
254
+ "job": null
255
+ },
256
+ "referenced_stat_context": [
257
+ "๋ณต์ง€ ์ •์ฑ… ํ™•๋Œ€ โ†’ ์ฐฌ์„ฑ 75%, ๋ฐ˜๋Œ€ 15%, ๋ชจ๋ฆ„ 10%",
258
+ "๊ธฐ๋ณธ์†Œ๋“์ œ ๊ด€๋ จ ์งˆ๋ฌธ โ†’ ๊ธ์ •์  ๋ฐ˜์‘ 62%",
259
+ "์ •์น˜ ์„ฑํ–ฅ ๋ถ„ํฌ โ†’ ์ง„๋ณด 45%, ์ค‘๋„ 38%, ๋ณด์ˆ˜ 17%"
260
+ ],
261
+ "response": "20๋Œ€ ์—ฌ์„ฑ์œผ๋กœ์„œ ๊ธฐ๋ณธ์†Œ๋“์ œ๋Š” ์ฒญ๋…„ ์„ธ๋Œ€์˜ ๋ถˆ์•ˆ์ •ํ•œ ๊ณ ์šฉ ์ƒํ™ฉ์„ ๊ณ ๋ คํ•  ๋•Œ ๊ธ์ •์ ์œผ๋กœ ๊ฒ€ํ† ํ•  ํ•„์š”๊ฐ€ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋งŒ ์žฌ์› ๋งˆ๋ จ ๋ฐฉ์•ˆ๊ณผ ํ•จ๊ป˜ ๋…ผ์˜๋˜์–ด์•ผ ์‹คํšจ์„ฑ์ด ์žˆ์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค."
262
+ }
263
+ ]
264
+ ```
265
+
266
+ ---
267
+
268
+ ## ๐Ÿ› ๏ธ ๊ธฐ์ˆ  ์Šคํƒ
269
+
270
+ ### Backend
271
+ - **FastAPI** - ๊ณ ์„ฑ๋Šฅ ๋น„๋™๊ธฐ API ํ”„๋ ˆ์ž„์›Œํฌ
272
+ - **Uvicorn** - ASGI ์„œ๋ฒ„
273
+
274
+ ### Data Processing
275
+ - **Pandas** - ๋ฐ์ดํ„ฐ ์กฐ์ž‘ ๋ฐ ํ•„ํ„ฐ๋ง
276
+ - **PyArrow** - Parquet ํŒŒ์ผ ์ฒ˜๋ฆฌ
277
+ - **NumPy** - ๋ฒกํ„ฐ ์—ฐ์‚ฐ ๋ฐ Cosine Similarity
278
+
279
+ ### AI/ML
280
+ - **Sentence Transformers** - KR-SBERT ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ (`snunlp/KR-SBERT-V40K-klueNLI-augSTS`)
281
+ - **Google Gemini 2.5 Flash** - LLM ํ…์ŠคํŠธ ์ƒ์„ฑ
282
+ - **scikit-learn** - Cosine Similarity ๊ณ„์‚ฐ
283
+
284
+ ### Infrastructure
285
+ - **Docker** - ์ปจํ…Œ์ด๋„ˆํ™” ๋ฐฐํฌ
286
+ - **Hugging Face Spaces** - ๋ฌด๋ฃŒ ํ˜ธ์ŠคํŒ…
287
+
288
+ ---
289
+
290
+ ## ๐Ÿ”ฌ ์—ฐ๊ตฌ ํ™œ์šฉ
291
+
292
+ ### 1. ์ปดํ“จํ…Œ์ด์…”๋„ ์‚ฌํšŒ๊ณผํ•™
293
+ - **ํ•ฉ์„ฑ ์ธ๊ตฌ(Synthetic Population)** ๏ฟฝ๏ฟฝ์„ฑ
294
+ - ๋Œ€๊ทœ๋ชจ ์„ค๋ฌธ ๋ฐ์ดํ„ฐ ร— LLM ๊ฒฐํ•ฉ ๋ฐฉ๋ฒ•๋ก 
295
+ - ์ €๋น„์šฉ ์—ฌ๋ก  ์‚ฌ์ „ ํƒ์ƒ‰ ๋„๊ตฌ
296
+
297
+ ### 2. ์ •์ฑ… ์‹œ๋ฎฌ๋ ˆ์ด์…˜
298
+ - ํŠน์ • ์ •์ฑ…์•ˆ์— ๋Œ€ํ•œ ์„ธ๋Œ€๋ณ„/์ง€์—ญ๋ณ„ ๋ฐ˜์‘ ์˜ˆ์ธก
299
+ - A/B ํ…Œ์ŠคํŠธ ์ „ ๋น ๋ฅธ ํ”„๋กœํ† ํƒ€์ดํ•‘
300
+ - ํƒ€๊ฒŸ ์ง‘๋‹จ ์ตœ์ ํ™”
301
+
302
+ ### 3. ์‹œ๊ณ„์—ด ๋ถ„์„
303
+ - ๋™์ผ ์ธ๊ตฌ์ง‘๋‹จ์˜ ๊ฐ€์น˜๊ด€ ๋ณ€ํ™” ์ถ”์  (2020~2025)
304
+ - ์‚ฌํšŒ์  ์ด์Šˆ์— ๋Œ€ํ•œ ์—ฌ๋ก  ๋ณ€๋™ ํŒจํ„ด ๋ถ„์„
305
+
306
+ ### 4. ๋น„๊ต๋ฌธํ™” ์—ฐ๊ตฌ
307
+ - ํ•œ๊ตญ ์„ค๋ฌธ ๋ฐ์ดํ„ฐ + ํƒ€๊ตญ ๋ฐ์ดํ„ฐ ๊ฒฐํ•ฉ
308
+ - ๊ตญ๊ฐ€ ๊ฐ„ ํŽ˜๋ฅด์†Œ๋‚˜ ๋น„๊ต ์—ฐ๊ตฌ
309
+
310
+ ### 5. ๋งˆ์ผ€ํŒ… & UX
311
+ - ํƒ€๊ฒŸ ๊ณ ๊ฐ์ธต์˜ ์„ ํ˜ธ๋„ ์˜ˆ์ธก
312
+ - ์ œํ’ˆ/์„œ๋น„์Šค ๋ฐ˜์‘ ์‹œ๋ฎฌ๋ ˆ์ด์…˜
313
+ - ์‚ฌ์šฉ์ž ํŽ˜๋ฅด์†Œ๋‚˜ ์ƒ์„ฑ ์ž๋™ํ™”
314
+
315
+ ### 6. ๊ต์œก
316
+ - ๋‹ค์–‘ํ•œ ๊ด€์  ์ฒดํ—˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜
317
+ - ์‚ฌํšŒ์  ๋‹ค์–‘์„ฑ ์ดํ•ด ๋„๊ตฌ
318
+
319
+ ---
320
+
321
+ ## ๐Ÿ“Š ์‹œ์Šคํ…œ ์„ฑ๋Šฅ
322
+
323
+ - **์‘๋‹ต ์†๋„**: 3~5์ดˆ (์ž„๋ฒ ๋”ฉ ๋กœ๋“œ ์‹œ๊ฐ„ ์ œ์™ธ)
324
+ - **๋™์‹œ ์ฒ˜๋ฆฌ**: FastAPI ๋น„๋™๊ธฐ ์ฒ˜๋ฆฌ
325
+ - **๋ฐ์ดํ„ฐ ํฌ๊ธฐ**: 13MB (Parquet ์••์ถ•)
326
+ - **๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ**: ์•ฝ 500MB (๋ชจ๋ธ ๋กœ๋“œ ํฌํ•จ)
327
+
328
+ ---
329
+
330
+ ## ๐Ÿ”’ API ํ‚ค ์„ค์ •
331
+
332
+ Gemini API ํ‚ค๋Š” ์š”์ฒญ ์‹œ ํ—ค๋”๋กœ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค:
333
+
334
+ ```bash
335
+ -H "X-API-Key: YOUR_GEMINI_API_KEY"
336
+ ```
337
+
338
+ API ํ‚ค ๋ฐœ๊ธ‰: [Google AI Studio](https://makersuite.google.com/app/apikey)
339
+
340
+ ---
341
+
342
+ ## ๐Ÿ“„ ๋ผ์ด์„ ์Šค
343
+
344
+ - **์ฝ”๋“œ**: Apache 2.0
345
+ - **๋ฐ์ดํ„ฐ**: ์ „๊ตญ์ง€ํ‘œ์กฐ์‚ฌ(NBS) ์›๋ณธ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜
346
+ - **๋ชจ๋ธ**: KR-SBERT (CC-BY-SA-4.0)
347
+
348
+ ---
349
+
350
+ ## ๐Ÿ“ž ๋ฌธ์˜ ๋ฐ ๊ธฐ์—ฌ
351
+
352
+ - **GitHub**: [Repository Link]
353
+ - **Hugging Face**: [jonghhhh/persona_survey](https://huggingface.co/spaces/jonghhhh/persona_survey)
354
+ - **์ด๋ฉ”์ผ**: jonghhhh@khu.ac.kr
355
+
356
+ ---
357
+
358
+ ## ๐Ÿ™ Acknowledgments
359
+
360
+ - **์ „๊ตญ์ง€ํ‘œ์กฐ์‚ฌ(NBS)**: ์›๋ณธ ๋ฐ์ดํ„ฐ ์ œ๊ณต
361
+ - **KR-SBERT**: ํ•œ๊ตญ์–ด ๋ฌธ์žฅ ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ
362
+ - **Hugging Face**: ๋ฌด๋ฃŒ ํ˜ธ์ŠคํŒ… ํ”Œ๋žซํผ
363
+
364
+ ---
365
 
366
+ <div align="center">
 
 
 
367
 
368
+ **Built with โค๏ธ for Computational Social Science Research**
369
 
370
+ </div>