bong9513 commited on
Commit
660afd5
Β·
verified Β·
1 Parent(s): c791ea8

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -427
README.md DELETED
@@ -1,427 +0,0 @@
1
- ### κ°€μ‹œκ±°λ¦¬(Visibility) 예츑 λͺ¨λΈλ§ ν”„λ‘œμ νŠΈ
2
-
3
- κΈ°μƒΒ·λŒ€κΈ°μ˜€μ—Ό 정보(ASOS, DataOn)λ₯Ό 톡합해 κ°€μ‹œκ±°λ¦¬(`visi`)λ₯Ό μ˜ˆμΈ‘ν•©λ‹ˆλ‹€. λΆˆκ· ν˜• 데이터λ₯Ό SMOTENC/CTGAN으둜 λ³΄κ°•ν•˜κ³ , GBDT(LightGBM/XGBoost)와 νƒ­μšΈλŸ¬ λ”₯λŸ¬λ‹(ResNet-like, FT-Transformer, DeepGBM)을 κ²°ν•©ν•΄ 닀쀑/이진 λΆ„λ₯˜λ₯Ό μˆ˜ν–‰ν•©λ‹ˆλ‹€.
4
-
5
- ### 기술 μŠ€νƒ(Tech Stack)
6
-
7
- - 데이터 처리: `pandas`, `numpy`
8
- - EDA/μ‹œκ°ν™”: `matplotlib`, `seaborn`
9
- - μƒ˜ν”Œλ§/λΆˆκ· ν˜• 처리: `imbalanced-learn (SMOTENC)`, `CTGAN`, `Optuna`(CTGAN ν•˜μ΄νΌνŒŒλΌλ―Έν„°), μ§€μ—­/연도 기반 λΆ„ν• 
10
- - λͺ¨λΈλ§(GBDT): `LightGBM`, `XGBoost`(GPU μ˜΅μ…˜ 포함, μ‚¬μš©μž μ •μ˜ CSI 평가)
11
- - λͺ¨λΈλ§(λ”₯λŸ¬λ‹): `PyTorch` 기반 `ResNetLike`, `FTTransformer`, `DeepGBM`
12
- - μ΅œμ ν™”: `hyperopt`(LightGBM/XGBoost), `Optuna`(CTGAN)
13
- - μœ ν‹Έ/μ €μž₯: `joblib`
14
-
15
- ### μ‹œμŠ€ν…œ μ•„ν‚€ν…μ²˜(νŒŒμ΄ν”„λΌμΈ)
16
-
17
- 1) 데이터 μˆ˜μ§‘/적재: `data/ASOS`, `data/dataon`
18
- 2) 병합/μ „μ²˜λ¦¬: `1.data_preprocessing/0.air_data_merge.ipynb` β†’ `1.data_preprocessing/1.data_merge.ipynb` β†’ `1.data_preprocessing/2.eda_preproccesing.ipynb` β†’ `1.data_preprocessing/3.make_train_test.ipynb`
19
- 3) 데이터 증강(λΆˆκ· ν˜• 처리): `2.make_oversample_data/` λ‚΄ `SMOTENC` β†’ `CTGAN(+Optuna)` β†’ κ·œμΉ™ 기반 필터링
20
- 4) 데이터 λΆ„ν• : 지역별(`*_train.csv`, `*_test.csv`), 연도 기반 3-Fold ν™€λ“œμ•„μ›ƒ
21
- 5) ν•™μŠ΅: GBDT(`5.optima/*/`)와 λ”₯λŸ¬λ‹ λ…ΈνŠΈλΆ
22
- 6) 평가/뢄석: μ‚¬μš©μž μ •μ˜ `CSI` + F1/Accuracy, `visualization/model_visualize.ipynb`, `find_reason/*`(νŠΈλ Œλ“œ, 뢄포 비ꡐ)
23
- 7) 앙상블/μ΅œμ’…: `model_voting_test_best_sample/ensemble__voting_best_sample.ipynb`, `final_test/final.ipynb`
24
-
25
- ### TL;DR (λΉ λ₯Έ μ‹œμž‘)
26
-
27
- 1) 파이썬 ν™˜κ²½ μ€€λΉ„ ν›„ ν•„μˆ˜ νŒ¨ν‚€μ§€ μ„€μΉ˜
28
-
29
- ```bash
30
- pip install -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
31
- pip install pandas numpy scikit-learn matplotlib seaborn imbalanced-learn optuna ctgan xgboost lightgbm joblib hyperopt
32
- ```
33
-
34
- 2) 데이터 배치
35
- - μ›μ²œ/쀑간 μ‚°μΆœλ¬Όμ„ `data/` ν•˜μœ„μ— 배치. ν•™μŠ΅μš© CSV/featherλŠ” `data/data_for_modeling/` μ°Έκ³ .
36
- - λ˜λŠ” Hugging Face μ €μž₯μ†Œμ—μ„œ `data/` 폴더λ₯Ό λ‹€μš΄λ‘œλ“œ:
37
- ```bash
38
- git clone https://huggingface.co/bong9513/visibility_prediction
39
- # 클둠 ν›„ visibility_prediction/data/ 폴더λ₯Ό ν”„λ‘œμ νŠΈμ˜ data/ μœ„μΉ˜λ‘œ λ³΅μ‚¬ν•˜κ±°λ‚˜ μ‚¬μš©
40
- ```
41
-
42
- 3) μ˜€λ²„μƒ˜ν”Œλ§ μˆ˜ν–‰(SMOTE/CTGAN)
43
-
44
- ```bash
45
- cd Analysis_code/2.make_oversample_data
46
- # SMOTE만 μ‚¬μš©ν•˜λŠ” 경우
47
- python smote_only/smote_sample_1.py
48
- # SMOTENC + CTGAN μ‚¬μš©ν•˜λŠ” 경우
49
- python smotenc_ctgan/smotenc_ctgan_sample_10000_1.py
50
- ```
51
-
52
- 4) λͺ¨λΈ ν•™μŠ΅ λ˜λŠ” λ‹€μš΄λ‘œλ“œ
53
- - **μ˜΅μ…˜ A: 직접 λͺ¨λΈ 생성**
54
- - GBDT μ΅œμ ν™”/ν•™μŠ΅ μ˜ˆμ‹œ(μ„œμšΈμ‹œ):
55
- ```bash
56
- cd Analysis_code/5.optima
57
- python lgb_smote/LGB_smote_seoul.py
58
- python xgb_smote/XGB_smote_seoul.py
59
- ```
60
- - λ”₯λŸ¬λ‹ λͺ¨λΈ ν•™μŠ΅/평가: λ…ΈνŠΈλΆ μ‹€ν–‰(`Analysis_code/` λ‚΄ `.ipynb`)
61
- - **μ˜΅μ…˜ B: 사전 ν•™μŠ΅λœ λͺ¨λΈ μ‚¬μš©**
62
- - Hugging Face μ €μž₯μ†Œμ—μ„œ 사전 ν•™μŠ΅λœ λͺ¨λΈ λ‹€μš΄λ‘œλ“œ:
63
- ```bash
64
- git clone https://huggingface.co/bong9513/visibility_prediction
65
- # 클둠 ν›„ visibility_prediction/save_model/ 폴더λ₯Ό Analysis_code/save_model/ μœ„μΉ˜λ‘œ 볡사
66
- ```
67
-
68
- ---
69
-
70
- ### ν”„λ‘œμ νŠΈ ꡬ쑰
71
-
72
- ```
73
- visibility_prediction/
74
- β”œβ”€β”€ Analysis_code/
75
- β”‚ β”œβ”€β”€ 1.data_preprocessing/ # 데이터 병합 및 μ „μ²˜λ¦¬
76
- β”‚ β”‚ β”œβ”€β”€ 0.air_data_merge.ipynb
77
- β”‚ β”‚ β”œβ”€β”€ 1.data_merge.ipynb
78
- β”‚ β”‚ β”œβ”€β”€ 2.eda_preproccesing.ipynb
79
- β”‚ β”‚ └── 3.make_train_test.ipynb
80
- β”‚ β”œβ”€β”€ 2.make_oversample_data/ # μ˜€λ²„μƒ˜ν”Œλ§ (SMOTE/CTGAN)
81
- β”‚ β”‚ β”œβ”€β”€ smote_only/ # SMOTE만 μ‚¬μš©
82
- β”‚ β”‚ β”œβ”€β”€ only_ctgan/ # CTGAN만 μ‚¬μš©
83
- β”‚ β”‚ └── smotenc_ctgan/ # SMOTENC + CTGAN μ‘°ν•©
84
- β”‚ β”œβ”€β”€ 3.sampled_data_analysis/ # μƒ˜ν”Œλ§ 데이터 뢄석
85
- β”‚ β”œβ”€β”€ 4.sampling_data_test/ # μƒ˜ν”Œλ§ 데이터 μ„±λŠ₯ ν…ŒμŠ€νŠΈ
86
- β”‚ β”œβ”€β”€ 5.optima/ # λͺ¨λΈ μ΅œμ ν™” 및 ν•™μŠ΅
87
- β”‚ β”‚ β”œβ”€β”€ lgb_smote/ # LightGBM (SMOTE)
88
- β”‚ β”‚ β”œβ”€β”€ lgb_pure/ # LightGBM (원본 데이터)
89
- β”‚ β”‚ β”œβ”€β”€ lgb_ctgan10000/ # LightGBM (CTGAN 10000)
90
- β”‚ β”‚ β”œβ”€β”€ lgb_smotenc_ctgan20000/ # LightGBM (SMOTENC+CTGAN 20000)
91
- β”‚ β”‚ β”œβ”€β”€ xgb_smote/ # XGBoost (SMOTE)
92
- β”‚ β”‚ β”œβ”€β”€ xgb_pure/ # XGBoost (원본 데이터)
93
- β”‚ β”‚ β”œβ”€β”€ xgb_ctgan10000/ # XGBoost (CTGAN 10000)
94
- β”‚ β”‚ β”œβ”€β”€ xgb_smotenc_ctgan20000/ # XGBoost (SMOTENC+CTGAN 20000)
95
- β”‚ β”‚ β”œβ”€β”€ resnet_like_smote/ # ResNet-like (SMOTE)
96
- β”‚ β”‚ β”œβ”€β”€ resnet_like_pure/ # ResNet-like (원본 데이터)
97
- β”‚ β”‚ β”œβ”€β”€ resnet_like_ctgan10000/ # ResNet-like (CTGAN 10000)
98
- β”‚ β”‚ β”œβ”€β”€ resnet_like_smotenc_ctgan20000/ # ResNet-like (SMOTENC+CTGAN 20000)
99
- β”‚ β”‚ β”œβ”€β”€ ft_transformer_smote/ # FT-Transformer (SMOTE)
100
- β”‚ β”‚ β”œβ”€β”€ ft_transformer_pure/ # FT-Transformer (원본 데이터)
101
- β”‚ β”‚ β”œβ”€β”€ ft_transformer_ctgan10000/ # FT-Transformer (CTGAN 10000)
102
- β”‚ β”‚ β”œβ”€β”€ ft_transformer_smotenc_ctgan20000/ # FT-Transformer (SMOTENC+CTGAN 20000)
103
- β”‚ β”‚ β”œβ”€β”€ deepgbm_smote/ # DeepGBM (SMOTE)
104
- β”‚ β”‚ β”œβ”€β”€ deepgbm_pure/ # DeepGBM (원본 데이터)
105
- β”‚ β”‚ β”œβ”€β”€ deepgbm_ctgan10000/ # DeepGBM (CTGAN 10000)
106
- β”‚ β”‚ └── deepgbm_smotenc_ctgan20000/ # DeepGBM (SMOTENC+CTGAN 20000)
107
- β”‚ β”œβ”€β”€ 6.optima_models_analysis/ # μ΅œμ ν™”λœ λͺ¨λΈ 뢄석
108
- β”‚ β”œβ”€β”€ models/ # λ”₯λŸ¬λ‹ λͺ¨λΈ μ •μ˜ 및 μ €μž₯
109
- β”‚ β”‚ β”œβ”€β”€ deepgbm.py
110
- β”‚ β”‚ β”œβ”€β”€ ft_transformer.py
111
- β”‚ β”‚ β”œβ”€β”€ resnet_like.py
112
- β”‚ β”‚ β”œβ”€β”€ best_resnet_model.pth
113
- β”‚ β”‚ └── tabnet_model.zip
114
- β”‚ β”œβ”€β”€ save_model/ # ν•™μŠ΅λœ λͺ¨λΈ μ €μž₯ (Hugging Faceμ—μ„œ λ‹€μš΄λ‘œλ“œ κ°€λŠ₯)
115
- β”‚ β”œβ”€β”€ optimization_history/ # μ΅œμ ν™” νžˆμŠ€ν† λ¦¬ (Hugging Faceμ—μ„œ λ‹€μš΄λ‘œλ“œ κ°€λŠ₯)
116
- β”‚ β”œβ”€β”€ visualization/ # λͺ¨λΈ μ‹œκ°ν™”
117
- β”‚ β”‚ └── model_visualize.ipynb
118
- β”‚ β”œβ”€β”€ find_reason/ # 지역별 νŠΈλ Œλ“œ/원인 뢄석 λ…ΈνŠΈλΆ
119
- β”‚ β”œβ”€β”€ model_voting_test_best_sample/
120
- β”‚ β”‚ └── ensemble__voting_best_sample.ipynb
121
- β”‚ └── final_test/
122
- β”‚ └── final.ipynb
123
- β”œβ”€β”€ data/
124
- β”‚ β”œβ”€β”€ ASOS/ # 기상 데이터
125
- β”‚ β”œβ”€β”€ dataon/ # λŒ€κΈ°μ˜€μ—Ό 데이터(λŒ€μš©λŸ‰ μΌμžλ³„ CSV)
126
- β”‚ β”œβ”€β”€ data_for_modeling/ # 지역별 train/test CSV 및 feather
127
- β”‚ β”œβ”€β”€ data_for_demo/
128
- β”‚ β”œβ”€β”€ data_oversampled/ # μ˜€λ²„μƒ˜ν”Œλ§λœ 데이터
129
- β”‚ β”‚ β”œβ”€β”€ smote/
130
- β”‚ β”‚ β”œβ”€β”€ ctgan7000/
131
- β”‚ β”‚ β”œβ”€β”€ ctgan10000/
132
- β”‚ β”‚ └── ctgan20000/
133
- β”‚ └── oversampled_data_test_for_model/ # ν…ŒμŠ€νŠΈμš© μ˜€λ²„μƒ˜ν”Œλ§ 데이터
134
- └── README.md
135
- ```
136
-
137
- ---
138
-
139
- ### 데이터 및 λ³€μˆ˜(Variables)
140
-
141
- - λͺ©ν‘œ λ³€μˆ˜
142
- - `visi`: κ°€μ‹œκ±°λ¦¬(연속값). ν•©μ„± ν‘œλ³Έ 필터링 κ·œμΉ™μ—μ„œ ν™•μΈλ˜λŠ” ꡬ간 μ˜ˆμ‹œ: class 0은 [0,100), class 1은 [100,500), class 2λŠ” κ·Έ μ™Έ κ΅¬κ°„μœΌλ‘œ μ‚¬μš©λ©λ‹ˆλ‹€.
143
- - `multi_class`: 닀쀑 λΆ„λ₯˜ 라벨(μ •μˆ˜ 0/1/2)
144
- - `binary_class`: 이진 라벨. κ·œμΉ™: `binary_class = 0 if multi_class == 2 else 1`
145
-
146
- - μ£Όμš” ν”Όμ²˜ κ·Έλ£Ή(μ½”λ“œ κΈ°μ€€)
147
- - 기상(ASOS): `temp_C`, `precip_mm`, `wind_speed`, `wind_dir`(μ •μ˜¨β†’0 μΉ˜ν™˜), `hm`, `vap_pressure`, `dewpoint_C`, `loc_pressure`, `sea_pressure`, `solarRad`, `snow_cm`, `cloudcover`(int), `lm_cloudcover`(int), `low_cloudbase`, `groundtemp`
148
- - λŒ€κΈ°μ˜€μ—Ό(DataOn): `O3`, `NO2`, `PM10`, `PM25`
149
- - μ‹œκ°„/μ£ΌκΈ°: `year`(int), `month`(int), `hour`(int), `hour_sin`, `hour_cos`, `month_sin`, `month_cos`
150
- - νŒŒμƒ: `ground_temp - temp_C`(μ§€λ©΄-기온 μ°¨)
151
-
152
- - λ²”μ£Όν˜• λ³€μˆ˜(λͺ¨λΈ/μƒ˜ν”Œλ§ 관점)
153
- - `wind_dir`, `cloudcover`, `lm_cloudcover`, 그리고 `int` νƒ€μž…μ˜ μ‹œκ°„ λ³€μˆ˜(`year`, `month`, `hour`)λŠ” SMOTENC/GBDTμ—μ„œ λ²”μ£Όν˜•μœΌλ‘œ 취급됨(μ½”λ“œμ—μ„œ `float64`κ°€ μ•„λ‹Œ μ—΄ 인덱슀 μžλ™ 탐지)
154
-
155
- - μ „μ²˜λ¦¬ κ·œμΉ™(발췌)
156
- - `wind_dir` 쀑 `'μ •μ˜¨'`은 "0"으둜 μΉ˜ν™˜ ν›„ μ •μˆ˜ν˜• λ³€ν™˜
157
- - `cloudcover, lm_cloudcover` μ •μˆ˜ν˜• λ³€ν™˜
158
- - ν•™μŠ΅ μ‹œ 타깃/보쑰 μ—΄(`multi_class, binary_class`) 뢄리 ν›„ ν•„μš” μ‹œ μž¬κ³„μ‚°
159
-
160
- ---
161
-
162
- ### EDA 및 μ „μ²˜λ¦¬
163
-
164
- - 병합/정리
165
- - 인덱슀 μ—΄ 제거: `Unnamed: 0` λ“œλ‘­
166
- - μžλ£Œν˜• μ •ν•©μ„±: `cloudcover`, `lm_cloudcover` μ •μˆ˜ν˜•; `year`, `month`, `hour` μ •μˆ˜ν˜•
167
- - νŠΉμˆ˜κ°’ μΉ˜ν™˜: `wind_dir == 'μ •μ˜¨'` β†’ "0" ν›„ μ •μˆ˜ν˜• λ³€ν™˜
168
-
169
- - νŠΉμ§• 곡학
170
- - μ£ΌκΈ°ν˜• 인코딩: `hour_sin`, `hour_cos`, `month_sin`, `month_cos`
171
- - μ°¨λΆ„ν˜• νŒŒμƒ: `ground_temp - temp_C`
172
-
173
- - 뢄포/νŠΈλ Œλ“œ 뢄석
174
- - 지역별 μ‹œκ³„μ—΄ νŠΈλ Œλ“œ: `Analysis_code/find_reason/*_trend.ipynb` (seoul, incheon, busan, daegu, daejeon, gwangju)
175
- - 뢄포 비ꡐ/λ³€ν™” 감지: `Analysis_code/find_reason/wasserstein_distance.ipynb`(Wasserstein 거리 기반 뢄포 차이 μ •λŸ‰ν™”)
176
-
177
- - 데이터 λΆ„ν• 
178
- - μ§€μ—­ λ‹¨μœ„ 데이터셋(`*_train.csv`, `*_test.csv`)
179
- - 연도 기반 ν™€λ“œμ•„μ›ƒ 3-Fold(2018–2020 μ‘°ν•©)둜 μΌλ°˜ν™” μ„±λŠ₯ 검증
180
-
181
- ### λΆˆκ· ν˜• 처리 및 ν•©μ„± μƒ˜ν”Œλ§
182
-
183
- - SMOTENC
184
- - λ²”μ£Όν˜• 인덱슀: μž…λ ₯ νŠΉμ„± 쀑 `float64`κ°€ μ•„λ‹Œ μ—΄μ˜ μœ„μΉ˜ 인덱슀 μ‚¬μš©
185
- - μƒ˜ν”Œλ§ μ „λž΅ μ˜ˆμ‹œ: `{0: 10000, 1: 10000, 2: κΈ°μ‘΄ 개수}` λ˜λŠ” 데이터 규λͺ¨μ— 따라 `{0: 500/1000, 1: ceil(n1/100)*100, 2: n2}`
186
- - μž¬κ³„μ‚°: μƒ˜ν”Œλ§ ν›„ `multi_class`μ—μ„œ `binary_class` 및 μ£ΌκΈ°/μ°¨λΆ„ νŒŒμƒμ„ 볡ꡬ
187
-
188
- - CTGAN(+Optuna)
189
- - 클래슀 0, 1을 λŒ€μƒμœΌλ‘œ Optuna둜 `embedding_dim, generator_dim, discriminator_dim, pac, batch_size, discriminator_steps` 탐색 ν›„ ν•©μ„±
190
- - 생성 ν‘œλ³Έ ν’ˆμ§ˆ ν•„ν„°: `class 0 β†’ 0 ≀ visi < 100`, `class 1 β†’ 100 ≀ visi < 500`
191
- - μ΅œμ’… ν•©λ³Έ ν›„ νŒŒμƒ/보쑰 ν”Όμ²˜(`binary_class`, μ£ΌκΈ°/μ°¨λΆ„ ν•­λͺ©) 볡ꡬ
192
-
193
- - μ‚°μΆœλ¬Ό
194
- - `data/data_oversampled/smote/`, `data/data_oversampled/ctgan7000/`, `data/data_oversampled/ctgan10000/`, `data/data_oversampled/ctgan20000/` ν•˜μœ„μ— 지역별 CSV μ €μž₯
195
-
196
- ---
197
-
198
- ### λͺ¨λΈ μ•„ν‚€ν…μ²˜(상세)
199
-
200
- - λ”₯λŸ¬λ‹(tabular)
201
- - `Analysis_code/models/resnet_like.py`
202
- - μž…λ ₯: `x_num [B, N_num]`, `x_cat [B, N_cat]` β†’ concat β†’ μž…λ ₯μ„ ν˜•(`d_main=128`) β†’ μž”μ°¨λΈ”λ‘(`n_blocks=4`, `d_hidden=64`, `dropout_first=0.25`) β†’ 좜λ ₯μΈ΅
203
- - 좜λ ₯: `num_classes == 2 β†’ 1 λ‘œμ§“`, `> 2 β†’ K λ‘œμ§“`
204
- - `Analysis_code/models/ft_transformer.py`
205
- - 수치: Linear(`num_features β†’ d_token=192`), λ²”μ£Ό: `cat_cardinalities`별 `nn.Embedding(d_token)` ν›„ ν•©μ„±
206
- - 인코더: `TransformerEncoderLayer(d_model=d_token, nhead=8, dropoutβ‰ˆ0.2)` Γ— `n_blocks=6` β†’ 평균 풀링 β†’ λΆ„λ₯˜ ν—€λ“œ
207
- - `Analysis_code/models/deepgbm.py`
208
- - 수치 Linear(`d_main=128`) + λ²”μ£Ό μž„λ² λ”© ν•©μ‚° β†’ μž”μ°¨ MLP 블둝(`n_blocks=4`, `d_hidden=64`, `dropoutβ‰ˆ0.2`) β†’ λΆ„λ₯˜ ν—€λ“œ
209
-
210
- - GBDT
211
- - LightGBM(`5.optima/lgb_smote/LGB_smote_seoul.py`): `objective='multiclassova'`, `n_estimatorsβ‰ˆ4000`, μ‘°κΈ°μ’…λ£Œ, GPU μ˜΅μ…˜ μ˜ˆμ‹œ 쑴재, `hyperopt`둜 `max_depth, min_child_weight, num_leaves, subsample, learning_rate` 탐색
212
- - XGBoost(`5.optima/xgb_smote/XGB_smote_seoul.py`): `objective='multi:softprob'`, `tree_method='hist'`, `enable_categorical=True`, GPU μ˜΅μ…˜, `hyperopt`둜 핡심 ν•˜μ΄νΌνŒŒλΌλ―Έν„° 탐색, `eval_metric=CSI`
213
-
214
- ---
215
-
216
- ### ν•˜μ΄νΌνŒŒλΌλ―Έν„° μ΅œμ ν™”
217
-
218
- λͺ¨λ“  λͺ¨λΈμ€ CSI(Critical Success Index) 점수λ₯Ό μ΅œλŒ€ν™”ν•˜λŠ” λ°©ν–₯으둜 ν•˜μ΄νΌνŒŒλΌλ―Έν„°λ₯Ό μ΅œμ ν™”ν•©λ‹ˆλ‹€. GBDT λͺ¨λΈμ€ `hyperopt`(TPE μ•Œκ³ λ¦¬μ¦˜)λ₯Ό, λ”₯λŸ¬λ‹ λͺ¨λΈμ€ `Optuna`(TPE sampler)λ₯Ό μ‚¬μš©ν•©λ‹ˆλ‹€.
219
-
220
- #### LightGBM ν•˜μ΄νΌνŒŒλΌλ―Έν„° 탐색 λ²”μœ„
221
-
222
- - **μ΅œμ ν™” 라이브러리**: `hyperopt` (TPE μ•Œκ³ λ¦¬μ¦˜)
223
- - **μ΅œμ ν™” μ‹œλ„ 횟수**: `max_evals=100`
224
- - **평가 μ§€ν‘œ**: CSI (3-Fold ꡐ차 검증 평균)
225
- - **탐색 λ²”μœ„**:
226
- - `learning_rate`: `hp.loguniform('learning_rate', np.log(0.01), np.log(0.2))` - 둜그 κ· λ“± 뢄포, λ²”μœ„ [0.01, 0.2]
227
- - `max_depth`: `hp.quniform('max_depth', 3, 15, 1)` - μ •μˆ˜ν˜• κ· λ“± 뢄포, λ²”μœ„ [3, 15]
228
- - `num_leaves`: `hp.quniform('num_leaves', 20, 150, 1)` - μ •μˆ˜ν˜• κ· λ“± 뢄포, λ²”μœ„ [20, 150] (2^max_depth보닀 μž‘κ²Œ μ„€μ •)
229
- - `min_child_weight`: `hp.quniform('min_child_weight', 1, 20, 1)` - μ •μˆ˜ν˜• κ· λ“± 뢄포, λ²”μœ„ [1, 20]
230
- - `subsample`: `hp.uniform('subsample', 0.6, 1.0)` - κ· λ“± 뢄포, λ²”μœ„ [0.6, 1.0]
231
- - `colsample_bytree`: `hp.uniform('colsample_bytree', 0.6, 1.0)` - κ· λ“± 뢄포, λ²”μœ„ [0.6, 1.0]
232
- - `reg_alpha`: `hp.uniform('reg_alpha', 0.0, 1.0)` - κ· λ“± 뢄포, λ²”μœ„ [0.0, 1.0] (L1 μ •κ·œν™”)
233
- - `reg_lambda`: `hp.uniform('reg_lambda', 0.0, 1.0)` - κ· λ“± 뢄포, λ²”μœ„ [0.0, 1.0] (L2 μ •κ·œν™”)
234
- - **κ³ μ • νŒŒλΌλ―Έν„°**: `n_estimators=4000`, `early_stopping_rounds=400`, `device='gpu'`, `objective='multiclassova'`, `random_state=42`
235
-
236
- #### XGBoost ν•˜μ΄νΌνŒŒλΌλ―Έν„° 탐색 λ²”μœ„
237
-
238
- - **μ΅œμ ν™” 라이브러리**: `hyperopt` (TPE μ•Œκ³ λ¦¬μ¦˜)
239
- - **μ΅œμ ν™” μ‹œλ„ 횟수**: `max_evals=100`
240
- - **평가 μ§€ν‘œ**: CSI (3-Fold ꡐ차 검증 평균, μ‚¬μš©μž μ •μ˜ `eval_metric_csi` ν•¨μˆ˜ μ‚¬μš©)
241
- - **탐색 λ²”μœ„**:
242
- - `learning_rate`: `hp.loguniform('learning_rate', np.log(0.01), np.log(0.2))` - 둜그 κ· λ“± 뢄포, λ²”μœ„ [0.01, 0.2]
243
- - `max_depth`: `hp.quniform('max_depth', 3, 12, 1)` - μ •μˆ˜ν˜• κ· λ“± 뢄포, λ²”μœ„ [3, 12]
244
- - `min_child_weight`: `hp.quniform('min_child_weight', 1, 20, 1)` - μ •μˆ˜ν˜• κ· λ“± 뢄포, λ²”μœ„ [1, 20]
245
- - `gamma`: `hp.uniform('gamma', 0, 5)` - κ· λ“± 뢄포, λ²”μœ„ [0, 5] (트리 뢄할을 μœ„ν•œ μ΅œμ†Œ 손싀 κ°μ†Œ κ°’)
246
- - `subsample`: `hp.uniform('subsample', 0.6, 1.0)` - κ· λ“± 뢄포, λ²”μœ„ [0.6, 1.0]
247
- - `colsample_bytree`: `hp.uniform('colsample_bytree', 0.6, 1.0)` - κ· λ“± 뢄포, λ²”μœ„ [0.6, 1.0]
248
- - `reg_alpha`: `hp.uniform('reg_alpha', 0.0, 1.0)` - κ· λ“± 뢄포, λ²”μœ„ [0.0, 1.0] (L1 μ •κ·œν™”)
249
- - `reg_lambda`: `hp.uniform('reg_lambda', 0.0, 1.0)` - κ· λ“± 뢄포, λ²”μœ„ [0.0, 1.0] (L2 μ •κ·œν™”)
250
- - **κ³ μ • νŒŒλΌλ―Έν„°**: `n_estimators=4000`, `early_stopping_rounds=400`, `tree_method='hist'`, `device='cuda'`, `enable_categorical=True`, `objective='multi:softprob'`, `random_state=42`
251
-
252
- #### FT-Transformer ν•˜μ΄νΌνŒŒλΌλ―Έν„° 탐색 λ²”μœ„
253
-
254
- - **μ΅œμ ν™” 라이브러리**: `Optuna` (TPE sampler)
255
- - **μ΅œμ ν™” μ‹œλ„ 횟수**: `n_trials=100`
256
- - **Pruning**: `MedianPruner(n_warmup_steps=10)` - 초반 10 에폭은 κ΄€μ°° ν›„ 이후 κ°€μ§€μΉ˜κΈ°
257
- - **평가 μ§€ν‘œ**: CSI (3-Fold ꡐ차 검증 평균)
258
- - **탐색 λ²”μœ„**:
259
- - `d_token`: `trial.suggest_int("d_token", 64, 256, step=32)` - μ •μˆ˜ν˜•, λ²”μœ„ [64, 256], 32 λ‹¨μœ„ 증가 (64, 96, 128, 160, 192, 224, 256)
260
- - `n_blocks`: `trial.suggest_int("n_blocks", 2, 6)` - μ •μˆ˜οΏ½οΏ½, λ²”μœ„ [2, 6] (깊이 μΆ•μ†Œλ‘œ 과적합 λ°©μ§€)
261
- - `n_heads`: `trial.suggest_categorical("n_heads", [4, 8])` - λ²”μ£Όν˜•, 선택지 [4, 8]
262
- - `attention_dropout`: `trial.suggest_float("attention_dropout", 0.1, 0.4)` - μ‹€μˆ˜ν˜•, λ²”μœ„ [0.1, 0.4]
263
- - `ffn_dropout`: `trial.suggest_float("ffn_dropout", 0.1, 0.4)` - μ‹€μˆ˜ν˜•, λ²”μœ„ [0.1, 0.4]
264
- - `lr` (learning_rate): `trial.suggest_float("lr", 1e-5, 1e-2, log=True)` - 둜그 μŠ€μΌ€μΌ μ‹€μˆ˜ν˜•, λ²”μœ„ [1e-5, 1e-2]
265
- - `weight_decay`: `trial.suggest_float("weight_decay", 1e-4, 1e-1, log=True)` - 둜그 μŠ€μΌ€μΌ μ‹€μˆ˜ν˜•, λ²”μœ„ [1e-4, 1e-1]
266
- - `batch_size`: `trial.suggest_categorical("batch_size", [32, 64, 128, 256])` - λ²”μ£Όν˜•, 선택지 [32, 64, 128, 256]
267
- - **ꡬ쑰적 μ œμ•½**: `d_token`은 `n_heads`의 λ°°μˆ˜μ—¬μ•Ό 함 (μ½”λ“œμ—μ„œ μžλ™ μ‘°μ •)
268
- - **κ³ μ • νŒŒλΌλ―Έν„°**: `num_classes=3`, `optimizer='AdamW'`, `epochs=200`, `patience=12`, `scheduler='ReduceLROnPlateau'` (factor=0.5, patience=3), `random_state=42`
269
-
270
- #### ResNet-like ν•˜μ΄νΌνŒŒλΌλ―Έν„° 탐색 λ²”μœ„
271
-
272
- - **μ΅œμ ν™” 라이브러리**: `Optuna` (TPE sampler)
273
- - **μ΅œμ ν™” μ‹œλ„ 횟수**: `n_trials=100`
274
- - **Pruning**: `MedianPruner(n_warmup_steps=10)` - 초반 10 에폭은 κ΄€μ°° ν›„ 이후 κ°€μ§€μΉ˜κΈ°
275
- - **평가 μ§€ν‘œ**: CSI (3-Fold ꡐ차 검증 평균)
276
- - **탐색 λ²”μœ„**:
277
- - `d_main`: `trial.suggest_int("d_main", 64, 256, step=32)` - μ •μˆ˜ν˜•, λ²”μœ„ [64, 256], 32 λ‹¨μœ„ 증가 (64, 96, 128, 160, 192, 224, 256)
278
- - `d_hidden`: `trial.suggest_int("d_hidden", 64, 512, step=64)` - μ •μˆ˜ν˜•, λ²”μœ„ [64, 512], 64 λ‹¨μœ„ 증가 (64, 128, 192, 256, 320, 384, 448, 512)
279
- - `n_blocks`: `trial.suggest_int("n_blocks", 2, 5)` - μ •μˆ˜ν˜•, λ²”μœ„ [2, 5] (λ„ˆλ¬΄ κΉŠμ§€ μ•Šκ²Œ 쑰절)
280
- - `dropout_first`: `trial.suggest_float("dropout_first", 0.1, 0.4)` - μ‹€μˆ˜ν˜•, λ²”μœ„ [0.1, 0.4]
281
- - `dropout_second`: `trial.suggest_float("dropout_second", 0.0, 0.2)` - μ‹€μˆ˜ν˜•, λ²”μœ„ [0.0, 0.2]
282
- - `lr` (learning_rate): `trial.suggest_float("lr", 1e-5, 1e-2, log=True)` - 둜그 μŠ€μΌ€μΌ μ‹€μˆ˜ν˜•, λ²”μœ„ [1e-5, 1e-2]
283
- - `weight_decay`: `trial.suggest_float("weight_decay", 1e-4, 1e-1, log=True)` - 둜그 μŠ€μΌ€μΌ μ‹€μˆ˜ν˜•, λ²”μœ„ [1e-4, 1e-1]
284
- - `batch_size`: `trial.suggest_categorical("batch_size", [32, 64, 128, 256])` - λ²”μ£Όν˜•, 선택지 [32, 64, 128, 256]
285
- - **κ³ μ • νŒŒλΌλ―Έν„°**: `num_classes=3`, `optimizer='AdamW'`, `epochs=200`, `patience=12`, `scheduler='ReduceLROnPlateau'` (factor=0.5, patience=3), `random_state=42`
286
-
287
- #### DeepGBM ν•˜μ΄νΌνŒŒλΌλ―Έν„° 탐색 λ²”μœ„
288
-
289
- - **μ΅œμ ν™” 라이브러리**: `Optuna` (TPE sampler)
290
- - **μ΅œμ ν™” μ‹œλ„ 횟수**: `n_trials=100`
291
- - **Pruning**: `MedianPruner(n_warmup_steps=10)` - 초반 10 에폭은 κ΄€μ°° ν›„ 이후 κ°€μ§€μΉ˜κΈ°
292
- - **평가 μ§€ν‘œ**: CSI (3-Fold ꡐ차 검증 평균)
293
- - **탐색 λ²”μœ„**:
294
- - `d_main`: `trial.suggest_int("d_main", 64, 256, step=32)` - μ •μˆ˜ν˜•, λ²”μœ„ [64, 256], 32 λ‹¨μœ„ 증가 (64, 96, 128, 160, 192, 224, 256)
295
- - `d_hidden`: `trial.suggest_int("d_hidden", 64, 256, step=64)` - μ •μˆ˜ν˜•, λ²”μœ„ [64, 256], 64 λ‹¨μœ„ 증가 (64, 128, 192, 256)
296
- - `n_blocks`: `trial.suggest_int("n_blocks", 2, 6)` - μ •μˆ˜ν˜•, λ²”μœ„ [2, 6]
297
- - `dropout`: `trial.suggest_float("dropout", 0.1, 0.4)` - μ‹€μˆ˜ν˜•, λ²”μœ„ [0.1, 0.4]
298
- - `lr` (learning_rate): `trial.suggest_float("lr", 1e-5, 1e-2, log=True)` - 둜그 μŠ€μΌ€μΌ μ‹€μˆ˜ν˜•, λ²”μœ„ [1e-5, 1e-2]
299
- - `weight_decay`: `trial.suggest_float("weight_decay", 1e-4, 1e-1, log=True)` - 둜그 μŠ€μΌ€μΌ μ‹€μˆ˜ν˜•, λ²”μœ„ [1e-4, 1e-1]
300
- - `batch_size`: `trial.suggest_categorical("batch_size", [32, 64, 128, 256])` - λ²”μ£Όν˜•, 선택지 [32, 64, 128, 256]
301
- - **κ³ μ • νŒŒλΌλ―Έν„°**: `num_classes=3`, `optimizer='AdamW'`, `epochs=200`, `patience=12`, `scheduler='ReduceLROnPlateau'` (factor=0.5, patience=3), `random_state=42`
302
-
303
- #### 곡톡 μ΅œμ ν™” μ„€μ •
304
-
305
- - **ꡐ차 검증**: λͺ¨λ“  λͺ¨λΈμ€ 연도 기반 3-Fold ν™€λ“œμ•„μ›ƒ ꡐ차 검증 μ‚¬μš©
306
- - Fold 1: Train [2018, 2019] β†’ Val 2020
307
- - Fold 2: Train [2018, 2020] β†’ Val 2019
308
- - Fold 3: Train [2019, 2020] β†’ Val 2018
309
- - **평가 μ§€ν‘œ**: CSI (Critical Success Index) - λͺ¨λ“  fold의 평균 CSIλ₯Ό μ΅œμ ν™” λͺ©ν‘œλ‘œ μ‚¬μš©
310
- - **μ΅œμ ν™” μ•Œκ³ λ¦¬μ¦˜**: TPE (Tree-structured Parzen Estimator)
311
- - **μž¬ν˜„μ„±**: `random_state=42` κ³ μ •
312
-
313
- ---
314
-
315
- ### ν•™μŠ΅/검증 μ „λž΅
316
-
317
- - 연도 기반 ν™€λ“œμ•„μ›ƒ 3-Fold(μ˜ˆμ‹œ)
318
- - Fold1: Train 2018–2019 β†’ Val 2020
319
- - Fold2: Train 2018–2020 β†’ Val 2019
320
- - Fold3: Train 2019–2020 β†’ Val 2018
321
- - μ§€μ—­ λ‹¨μœ„λ‘œ 별도 ν•™μŠ΅(예: `seoul_train.csv` λ“±)
322
-
323
- ---
324
-
325
- ### 평가 μ§€ν‘œ
326
-
327
- - μ‚¬μš©μž μ •μ˜ CSI(Critical Success Index) 닀쀑뢄λ₯˜ 버전
328
-
329
- ```python
330
- H = cm[0, 0] + cm[1, 1]
331
- F = (cm[1, 0] + cm[2, 0] + cm[0, 1] + cm[2, 1])
332
- M = (cm[0, 2] + cm[1, 2])
333
- CSI = H / (H + F + M + 1e-10)
334
- ```
335
-
336
- - κ·Έ μ™Έ: 정확도, F1 λ“± λ…ΈνŠΈλΆ/μŠ€ν¬λ¦½νŠΈμ—μ„œ 병행 확인
337
-
338
- ---
339
-
340
- ### μ‹€ν–‰ 방법(상세)
341
-
342
- - ν™˜κ²½ μ€€λΉ„
343
- - Python 3.8+ ꢌμž₯, CUDA 지원 μ‹œ GPU μ‚¬μš© κ°€λŠ₯(CTGAN/GBDT 속도 ν–₯상)
344
- - LightGBM GPUκ°€ λ―Έμ„€μΉ˜λΌλ©΄ `pip install lightgbm`으둜 CPU 버전 μ‚¬μš© λ˜λŠ” GPU λΉŒλ“œ ν•„μš”
345
-
346
- - 데이터 μ€€λΉ„
347
- - `data/ASOS/`: 연도별 기상 μ›μ²œ
348
- - `data/dataon/`: λŒ€κΈ°μ˜€μ—Ό μΌμžλ³„ CSV(λŒ€μš©λŸ‰)
349
- - `data/data_for_modeling/`: 지역별 ν•™μŠ΅/평가 μ„ΈνŠΈ(`*_train.csv`, `*_test.csv`, `df_*.feather`)
350
- - **Hugging Faceμ—μ„œ λ‹€μš΄λ‘œλ“œ**: 전체 `data/` 폴더λ₯Ό [Hugging Face μ €μž₯μ†Œ](https://huggingface.co/bong9513/visibility_prediction/tree/main/data)μ—μ„œ λ‹€μš΄λ‘œλ“œ κ°€λŠ₯
351
- ```bash
352
- git clone https://huggingface.co/bong9513/visibility_prediction
353
- # 클둠 ν›„ visibility_prediction/data/ 폴더λ₯Ό ν”„λ‘œμ νŠΈμ˜ data/ μœ„μΉ˜λ‘œ 볡사
354
- ```
355
-
356
- - μ „μ²˜λ¦¬/탐색
357
- - `Analysis_code/1.data_preprocessing/0.air_data_merge.ipynb` β†’ `1.data_preprocessing/1.data_merge.ipynb` β†’ `1.data_preprocessing/2.eda_preproccesing.ipynb` β†’ `1.data_preprocessing/3.make_train_test.ipynb`
358
-
359
- - μ˜€λ²„μƒ˜ν”Œλ§
360
- - `Analysis_code/2.make_oversample_data/`μ—μ„œ 슀크립트 μ‹€ν–‰(상단 TL;DR μ°Έμ‘°)
361
-
362
- - GBDT μ΅œμ ν™”/ν•™μŠ΅
363
- - **μ˜΅μ…˜ 1: 직접 λͺ¨λΈ 생성**
364
- - `Analysis_code/5.optima/lgb_smote/LGB_smote_seoul.py`, `5.optima/xgb_smote/XGB_smote_seoul.py` μ‹€ν–‰ν•˜μ—¬ λͺ¨λΈ ν•™μŠ΅
365
- - μ‚°μΆœ λͺ¨λΈμ€ `Analysis_code/save_model/` ν•˜μœ„μ— `.pkl`둜 μ €μž₯
366
- - 각 λͺ¨λΈλ³„λ‘œ 지역별 μŠ€ν¬λ¦½νŠΈκ°€ 쑴재 (seoul, incheon, busan, daegu, daejeon, gwangju)
367
- - **μ˜΅μ…˜ 2: 사전 ν•™μŠ΅λœ λͺ¨λΈ μ‚¬μš©**
368
- - Hugging Face μ €μž₯μ†Œμ—μ„œ 사전 ν•™μŠ΅λœ λͺ¨λΈκ³Ό μ΅œμ ν™” νžˆμŠ€ν† λ¦¬ λ‹€μš΄λ‘œλ“œ κ°€λŠ₯
369
- - `save_model/`: [Hugging Face μ €μž₯μ†Œ](https://huggingface.co/bong9513/visibility_prediction/tree/main/save_model)μ—μ„œ 사전 ν•™μŠ΅λœ λͺ¨λΈ λ‹€μš΄λ‘œλ“œ
370
- - `optimization_history/`: [Hugging Face μ €μž₯μ†Œ](https://huggingface.co/bong9513/visibility_prediction/tree/main/optimization_history)μ—μ„œ μ΅œμ ν™” νžˆμŠ€ν† λ¦¬ 파일 λ‹€μš΄λ‘œλ“œ
371
- ```bash
372
- git clone https://huggingface.co/bong9513/visibility_prediction
373
- # 클둠 ν›„ visibility_prediction/save_model/ 및 visibility_prediction/optimization_history/ 폴더λ₯Ό
374
- # 각각 Analysis_code/save_model/ 및 Analysis_code/optimization_history/ μœ„μΉ˜λ‘œ 볡사
375
- ```
376
-
377
- - λ”₯λŸ¬λ‹ ν•™μŠ΅
378
- - **μ˜΅μ…˜ 1: 직접 λͺ¨λΈ 생성**
379
- - `Analysis_code/5.optima/` ν•˜μœ„μ˜ 각 λͺ¨λΈ 폴더(`resnet_like_*`, `ft_transformer_*`, `deepgbm_*`)μ—μ„œ 지역별 슀크립트 μ‹€ν–‰
380
- - 예: `5.optima/resnet_like_smote/resnet_like_smote_seoul.py`
381
- - λͺ¨λΈ μ •μ˜λŠ” `Analysis_code/models/` 폴더에 있음 (`deepgbm.py`, `ft_transformer.py`, `resnet_like.py`)
382
- - μ‹œκ°ν™”: `Analysis_code/visualization/model_visualize.ipynb`둜 μ‹œκ°ν™”
383
- - **μ˜΅μ…˜ 2: 사전 ν•™μŠ΅λœ λͺ¨λΈ μ‚¬μš©**
384
- - Hugging Face μ €μž₯μ†Œμ˜ `save_model/` ν΄λ”μ—μ„œ 사전 ν•™μŠ΅λœ λ”₯λŸ¬λ‹ λͺ¨λΈ λ‹€μš΄λ‘œλ“œ κ°€λŠ₯
385
- - [Hugging Face μ €μž₯μ†Œ](https://huggingface.co/bong9513/visibility_prediction/tree/main/save_model)μ—μ„œ ν•΄λ‹Ή λͺ¨λΈ 파일 λ‹€μš΄λ‘œλ“œ ν›„ μ‚¬μš©
386
-
387
- - 앙상블/μ΅œμ’… 평가
388
- - `Analysis_code/model_voting_test_best_sample/ensemble__voting_best_sample.ipynb`
389
- - `Analysis_code/final_test/final.ipynb`
390
-
391
- ---
392
-
393
- ### λͺ¨λΈ μž…μΆœλ ₯ 규격(μš”μ•½)
394
-
395
- - 수치 μž…λ ₯ `x_num`: `float32` ν…μ„œ `[batch, num_numeric_features]`
396
- - λ²”μ£Ό μž…λ ₯ `x_cat`: μ •μˆ˜ 인덱슀 ν…μ„œ `[batch, num_categorical_features]`
397
- - 좜λ ₯: 이진(1 λ‘œμ§“) λ˜λŠ” 닀쀑뢄λ₯˜(K λ‘œμ§“). 손싀/μž„κ³„κ°’μ€ λ…ΈνŠΈλΆ λ‚΄ μ„€μ • μ°Έκ³ 
398
-
399
- ---
400
-
401
- ### μž¬ν˜„μ„±/μ‹œλ“œ
402
-
403
- - `random_state=42`(SMOTENC), λͺ¨λΈ 슀크립트 λ‚΄ `random_state=120` λ“±μ˜ κ³ μ •κ°’ μ‚¬μš©
404
- - 데이터/ν•˜λ“œμ›¨μ–΄ 차이에 따라 μž¬ν˜„λ₯ μ΄ λ‹€λ₯Ό 수 μžˆμœΌλ―€λ‘œ fold/seedλ₯Ό λͺ…μ‹œμ μœΌλ‘œ μ„€μ • ꢌμž₯
405
-
406
- ---
407
-
408
- ### 주의/νŠΈλŸ¬λΈ”μŠˆνŒ…
409
-
410
- - `5.optima/lgb_smote/LGB_smote_seoul.py`의 `sys.path.append(...)`λŠ” ν™˜κ²½ 의쑴적 κ²½λ‘œμž…λ‹ˆλ‹€. 일반 ν™˜κ²½μ—μ„œλŠ” μ œκ±°ν•΄λ„ `from lightgbm import LGBMClassifier`κ°€ λ™μž‘ν•΄μ•Ό ν•©λ‹ˆλ‹€.
411
- - μŠ€ν¬λ¦½νŠΈλŠ” μƒλŒ€ 경둜λ₯Ό κ°€μ •ν•©λ‹ˆλ‹€. μ‹€ν–‰ μ „ ν˜„μž¬ μž‘μ—… 디렉터리가 `Analysis_code/5.optima/` ν•˜μœ„μΈμ§€ ν™•μΈν•˜μ„Έμš”.
412
- - `wind_dir`의 `'μ •μ˜¨'` κ°’ μΉ˜ν™˜/ν˜•λ³€ν™˜μ΄ λˆ„λ½λ˜λ©΄ GBDT/XGBμ—μ„œ 였λ₯˜κ°€ λ°œμƒν•  수 μžˆμŠ΅λ‹ˆλ‹€.
413
- - `dataon/`은 맀우 λŒ€μš©λŸ‰μž…λ‹ˆλ‹€. λ©”λͺ¨λ¦¬ μ—¬μœ λ₯Ό ν™•λ³΄ν•˜κ±°λ‚˜ 연도/μ§€μ—­ λ‹¨μœ„λ‘œ μ²˜λ¦¬ν•˜μ„Έμš”.
414
-
415
- ---
416
-
417
- ### μ˜μ‘΄μ„±
418
-
419
- - Python 3.8+
420
- - PyTorch, pandas, numpy, scikit-learn, imbalanced-learn, optuna, ctgan, xgboost, lightgbm, joblib, matplotlib, seaborn, hyperopt
421
-
422
- ---
423
-
424
- ### λΌμ΄μ„ μŠ€/인용
425
-
426
- - λΌμ΄μ„ μŠ€: μΆ”ν›„ μ—…λ°μ΄νŠΈ μ˜ˆμ •
427
- - λ³Έ ν”„λ‘œμ νŠΈ/결과물을 인용 μ‹œ `visibility_prediction` μ €μž₯μ†Œμ™€ μ‚¬μš©λœ 데이터 μ†ŒμŠ€(ASOS, DataOn)λ₯Ό λͺ…μ‹œν•΄ μ£Όμ„Έμš”.