YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

๊ฐ€์‹œ๊ฑฐ๋ฆฌ(Visibility) ์˜ˆ์ธก ๋ชจ๋ธ๋ง ํ”„๋กœ์ ํŠธ

๊ธฐ์ƒยท๋Œ€๊ธฐ์˜ค์—ผยทํ•ญ๊ณต์ •๋ณด(ASOS, DataOn, TAF)๋ฅผ ํ†ตํ•ฉํ•ด ๊ฐ€์‹œ๊ฑฐ๋ฆฌ(visi)๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ๋ถˆ๊ท ํ˜• ๋ฐ์ดํ„ฐ๋ฅผ SMOTENC/CTGAN์œผ๋กœ ๋ณด๊ฐ•ํ•˜๊ณ , GBDT(LightGBM/XGBoost)์™€ ์ •ํ˜•๋ฐ์ดํ„ฐ ํŠนํ™” ๋”ฅ๋Ÿฌ๋‹(ResNet-like, FT-Transformer, DeepGBM)์„ ๊ฒฐํ•ฉํ•ด ๋‹ค์ค‘/์ด์ง„ ๋ถ„๋ฅ˜๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

๊ธฐ์ˆ  ์Šคํƒ(Tech Stack)

  • ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ: pandas, numpy
  • EDA/์‹œ๊ฐํ™”: matplotlib, seaborn
  • ์ƒ˜ํ”Œ๋ง/๋ถˆ๊ท ํ˜• ์ฒ˜๋ฆฌ: imbalanced-learn (SMOTENC), CTGAN, Optuna(CTGAN ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ), ์ง€์—ญ/์—ฐ๋„ ๊ธฐ๋ฐ˜ ๋ถ„ํ• 
  • ๋ชจ๋ธ๋ง(GBDT): LightGBM, XGBoost(GPU ์˜ต์…˜ ํฌํ•จ, ์‚ฌ์šฉ์ž ์ •์˜ CSI ํ‰๊ฐ€)
  • ๋ชจ๋ธ๋ง(๋”ฅ๋Ÿฌ๋‹): PyTorch ๊ธฐ๋ฐ˜ ResNetLike, FTTransformer, DeepGBM
  • ์ตœ์ ํ™”: hyperopt(LightGBM/XGBoost), Optuna(CTGAN)
  • ์œ ํ‹ธ/์ €์žฅ: joblib

์‹œ์Šคํ…œ ์•„ํ‚คํ…์ฒ˜(ํŒŒ์ดํ”„๋ผ์ธ)

  1. ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘/์ ์žฌ: data/ASOS, data/dataon, data/data_for_TAF
  2. ๋ณ‘ํ•ฉ/์ „์ฒ˜๋ฆฌ: 0.air_data_merge.ipynb โ†’ 1.data_merge.ipynb โ†’ 2.eda_preproccesing.ipynb
  3. ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•(๋ถˆ๊ท ํ˜• ์ฒ˜๋ฆฌ): Analysis_code/make_oversample_data/ ๋‚ด SMOTENC โ†’ CTGAN(+Optuna) โ†’ ๊ทœ์น™ ๊ธฐ๋ฐ˜ ํ•„ํ„ฐ๋ง
  4. ๋ฐ์ดํ„ฐ ๋ถ„ํ• : ์ง€์—ญ๋ณ„(*_train.csv, *_test.csv), ์—ฐ๋„ ๊ธฐ๋ฐ˜ 3-Fold ํ™€๋“œ์•„์›ƒ
  5. ํ•™์Šต: GBDT(optima/*.py)์™€ ๋”ฅ๋Ÿฌ๋‹ ๋…ธํŠธ๋ถ(deeplearning_model_*)
  6. ํ‰๊ฐ€/๋ถ„์„: ์‚ฌ์šฉ์ž ์ •์˜ CSI + F1/Accuracy, model_visualize.ipynb, find_reason/*(ํŠธ๋ Œ๋“œ, ๋ถ„ํฌ ๋น„๊ต)
  7. ์•™์ƒ๋ธ”/์ตœ์ข…: model_voting_test_best_sample/*, final_test/final.ipynb

TL;DR (๋น ๋ฅธ ์‹œ์ž‘)

  1. ํŒŒ์ด์ฌ ํ™˜๊ฒฝ ์ค€๋น„ ํ›„ ํ•„์ˆ˜ ํŒจํ‚ค์ง€ ์„ค์น˜
pip install -U torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install pandas numpy scikit-learn matplotlib seaborn imbalanced-learn optuna ctgan xgboost lightgbm joblib
  1. ๋ฐ์ดํ„ฐ ๋ฐฐ์น˜
  • ์›์ฒœ/์ค‘๊ฐ„ ์‚ฐ์ถœ๋ฌผ์„ data/ ํ•˜์œ„์— ๋ฐฐ์น˜. ํ•™์Šต์šฉ CSV/feather๋Š” data/data_for_modeling/ ์ฐธ๊ณ .
  1. ์˜ค๋ฒ„์ƒ˜ํ”Œ๋ง ์ˆ˜ํ–‰(SMOTE/CTGAN)
cd Analysis_code/make_oversample_data
python smote_sample_1.py
python oversampling_code.py
  1. GBDT ์ตœ์ ํ™”/ํ•™์Šต ์˜ˆ์‹œ(์„œ์šธ์‹œ)
cd ../optima
python LGB_smote_seoul.py
python XGB_smote_seoul.py
  1. ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ํ•™์Šต/ํ‰๊ฐ€: ๋…ธํŠธ๋ถ ์‹คํ–‰(Analysis_code/ ๋‚ด .ipynb)

ํ”„๋กœ์ ํŠธ ๊ตฌ์กฐ

visibility_prediction/
โ”œโ”€โ”€ Analysis_code/
โ”‚   โ”œโ”€โ”€ 0.air_data_merge.ipynb
โ”‚   โ”œโ”€โ”€ 1.data_merge.ipynb
โ”‚   โ”œโ”€โ”€ 2.eda_preproccesing.ipynb
โ”‚   โ”œโ”€โ”€ 3.oversampling.ipynb
โ”‚   โ”œโ”€โ”€ deeplearning_model_binary.ipynb
โ”‚   โ”œโ”€โ”€ deeplearning_model_multi.ipynb
โ”‚   โ”œโ”€โ”€ make_train_test.ipynb
โ”‚   โ”œโ”€โ”€ model_visualize.ipynb
โ”‚   โ”œโ”€โ”€ final_test/
โ”‚   โ”‚   โ””โ”€โ”€ final.ipynb
โ”‚   โ”œโ”€โ”€ find_reason/                # ์ง€์—ญ๋ณ„ ํŠธ๋ Œ๋“œ/์›์ธ ๋ถ„์„ ๋…ธํŠธ๋ถ
โ”‚   โ”œโ”€โ”€ sampling_data_test/         # ์ƒ˜ํ”Œ๋ง ๋ฐ์ดํ„ฐ ์„ฑ๋Šฅ ํ…Œ์ŠคํŠธ ๋…ธํŠธ๋ถ
โ”‚   โ”œโ”€โ”€ model_voting_test_best_sample/
โ”‚   โ”‚   โ””โ”€โ”€ ensemble__voting_best_sample.ipynb
โ”‚   โ”œโ”€โ”€ make_oversample_data/
โ”‚   โ”‚   โ”œโ”€โ”€ oversampling_code.py    # SMOTENC+CTGAN ํŒŒ์ดํ”„๋ผ์ธ
โ”‚   โ”‚   โ”œโ”€โ”€ smote_sample_1.py       # ์—ฐ๋„/์ „์ฒ˜๋ฆฌ ํฌํ•จ SMOTE ์ƒ˜ํ”Œ
โ”‚   โ”‚   โ””โ”€โ”€ (gan_sample_*.py ๋“ฑ)
โ”‚   โ”œโ”€โ”€ optima/                     # GBDT ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํƒ์ƒ‰/ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ
โ”‚   โ”‚   โ”œโ”€โ”€ LGB_smote_seoul.py
โ”‚   โ”‚   โ””โ”€โ”€ XGB_smote_seoul.py
โ”‚   โ”œโ”€โ”€ models/
โ”‚   โ”‚   โ”œโ”€โ”€ best_resnet_model.pth
โ”‚   โ”‚   โ””โ”€โ”€ tabnet_model.zip
โ”‚   โ”œโ”€โ”€ deepgbm.py
โ”‚   โ”œโ”€โ”€ ft_transformer.py
โ”‚   โ””โ”€โ”€ resnet_like.py
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ ASOS/                       # ๊ธฐ์ƒ
โ”‚   โ”œโ”€โ”€ dataon/                     # ๋Œ€๊ธฐ์˜ค์—ผ(๋Œ€์šฉ๋Ÿ‰ ์ผ์ž๋ณ„ CSV)
โ”‚   โ”œโ”€โ”€ data_for_modeling/          # ์ง€์—ญ๋ณ„ train/test CSV ๋ฐ feather
โ”‚   โ”œโ”€โ”€ data_for_demo/
โ”‚   โ”œโ”€โ”€ data_for_TAF/               # ๊ณตํ•ญ TAF(ํ•ญ๊ณต๊ธฐ์ƒ) CSV
โ”‚   โ””โ”€โ”€ data_oversampled/
โ”‚       โ”œโ”€โ”€ smote/
โ”‚       โ”œโ”€โ”€ ctgan7000/
โ”‚       โ”œโ”€โ”€ ctgan10000/
โ”‚       โ””โ”€โ”€ ctgan20000/
โ””โ”€โ”€ README.md

๋ฐ์ดํ„ฐ ๋ฐ ๋ณ€์ˆ˜(Variables)

  • ๋ชฉํ‘œ ๋ณ€์ˆ˜

    • visi: ๊ฐ€์‹œ๊ฑฐ๋ฆฌ(์—ฐ์†๊ฐ’). ํ•ฉ์„ฑ ํ‘œ๋ณธ ํ•„ํ„ฐ๋ง ๊ทœ์น™์—์„œ ํ™•์ธ๋˜๋Š” ๊ตฌ๊ฐ„ ์˜ˆ์‹œ: class 0์€ [0,100), class 1์€ [100,500), class 2๋Š” ๊ทธ ์™ธ ๊ตฌ๊ฐ„์œผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
    • multi_class: ๋‹ค์ค‘ ๋ถ„๋ฅ˜ ๋ผ๋ฒจ(์ •์ˆ˜ 0/1/2)
    • binary_class: ์ด์ง„ ๋ผ๋ฒจ. ๊ทœ์น™: binary_class = 0 if multi_class == 2 else 1
  • ์ฃผ์š” ํ”ผ์ฒ˜ ๊ทธ๋ฃน(์ฝ”๋“œ ๊ธฐ์ค€)

    • ๊ธฐ์ƒ(ASOS): temp_C, precip_mm, wind_speed, wind_dir(์ •์˜จโ†’0 ์น˜ํ™˜), hm, vap_pressure, dewpoint_C, loc_pressure, sea_pressure, solarRad, snow_cm, cloudcover(int), lm_cloudcover(int), low_cloudbase, groundtemp
    • ๋Œ€๊ธฐ์˜ค์—ผ(DataOn): O3, NO2, PM10, PM25
    • ์‹œ๊ฐ„/์ฃผ๊ธฐ: year(int), month(int), hour(int), hour_sin, hour_cos, month_sin, month_cos
    • ํŒŒ์ƒ: ground_temp - temp_C(์ง€๋ฉด-๊ธฐ์˜จ ์ฐจ)
  • ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜(๋ชจ๋ธ/์ƒ˜ํ”Œ๋ง ๊ด€์ )

    • wind_dir, cloudcover, lm_cloudcover, ๊ทธ๋ฆฌ๊ณ  int ํƒ€์ž…์˜ ์‹œ๊ฐ„ ๋ณ€์ˆ˜(year, month, hour)๋Š” SMOTENC/GBDT์—์„œ ๋ฒ”์ฃผํ˜•์œผ๋กœ ์ทจ๊ธ‰๋จ(์ฝ”๋“œ์—์„œ float64๊ฐ€ ์•„๋‹Œ ์—ด ์ธ๋ฑ์Šค ์ž๋™ ํƒ์ง€)
  • ์ „์ฒ˜๋ฆฌ ๊ทœ์น™(๋ฐœ์ทŒ)

    • wind_dir ์ค‘ '์ •์˜จ'์€ "0"์œผ๋กœ ์น˜ํ™˜ ํ›„ ์ •์ˆ˜ํ˜• ๋ณ€ํ™˜
    • cloudcover, lm_cloudcover ์ •์ˆ˜ํ˜• ๋ณ€ํ™˜
    • ํ•™์Šต ์‹œ ํƒ€๊นƒ/๋ณด์กฐ ์—ด(multi_class, binary_class) ๋ถ„๋ฆฌ ํ›„ ํ•„์š” ์‹œ ์žฌ๊ณ„์‚ฐ

EDA ๋ฐ ์ „์ฒ˜๋ฆฌ

  • ๋ณ‘ํ•ฉ/์ •๋ฆฌ

    • ์ธ๋ฑ์Šค ์—ด ์ œ๊ฑฐ: Unnamed: 0 ๋“œ๋กญ
    • ์ž๋ฃŒํ˜• ์ •ํ•ฉ์„ฑ: cloudcover, lm_cloudcover ์ •์ˆ˜ํ˜•; year, month, hour ์ •์ˆ˜ํ˜•
    • ํŠน์ˆ˜๊ฐ’ ์น˜ํ™˜: wind_dir == '์ •์˜จ' โ†’ "0" ํ›„ ์ •์ˆ˜ํ˜• ๋ณ€ํ™˜
  • ํŠน์ง• ๊ณตํ•™

    • ์ฃผ๊ธฐํ˜• ์ธ์ฝ”๋”ฉ: hour_sin, hour_cos, month_sin, month_cos
    • ์ฐจ๋ถ„ํ˜• ํŒŒ์ƒ: ground_temp - temp_C
  • ๋ถ„ํฌ/ํŠธ๋ Œ๋“œ ๋ถ„์„

    • ์ง€์—ญ๋ณ„ ์‹œ๊ณ„์—ด ํŠธ๋ Œ๋“œ: find_reason/*_trend.ipynb
    • ๋ถ„ํฌ ๋น„๊ต/๋ณ€ํ™” ๊ฐ์ง€: find_reason/wasserstein_distance.ipynb(Wasserstein ๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ˜ ๋ถ„ํฌ ์ฐจ์ด ์ •๋Ÿ‰ํ™”)
  • ๋ฐ์ดํ„ฐ ๋ถ„ํ• 

    • ์ง€์—ญ ๋‹จ์œ„ ๋ฐ์ดํ„ฐ์…‹(*_train.csv, *_test.csv)
    • ์—ฐ๋„ ๊ธฐ๋ฐ˜ ํ™€๋“œ์•„์›ƒ 3-Fold(2018โ€“2020 ์กฐํ•ฉ)๋กœ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ ๊ฒ€์ฆ

๋ถˆ๊ท ํ˜• ์ฒ˜๋ฆฌ ๋ฐ ํ•ฉ์„ฑ ์ƒ˜ํ”Œ๋ง

  • SMOTENC

    • ๋ฒ”์ฃผํ˜• ์ธ๋ฑ์Šค: ์ž…๋ ฅ ํŠน์„ฑ ์ค‘ float64๊ฐ€ ์•„๋‹Œ ์—ด์˜ ์œ„์น˜ ์ธ๋ฑ์Šค ์‚ฌ์šฉ
    • ์ƒ˜ํ”Œ๋ง ์ „๋žต ์˜ˆ์‹œ: {0: 10000, 1: 10000, 2: ๊ธฐ์กด ๊ฐœ์ˆ˜} ๋˜๋Š” ๋ฐ์ดํ„ฐ ๊ทœ๋ชจ์— ๋”ฐ๋ผ {0: 500/1000, 1: ceil(n1/100)*100, 2: n2}๋กœ ์ดˆ๊ธฐ ์ฆ๊ฐ• ํ›„ CTGAN์œผ๋กœ ์ตœ์ข… ์ฆ๊ฐ• (<-> 155ํ–‰)
    • ์žฌ๊ณ„์‚ฐ: ์ƒ˜ํ”Œ๋ง ํ›„ multi_class์—์„œ binary_class ๋ฐ ์ฃผ๊ธฐ/์ฐจ๋ถ„ ํŒŒ์ƒ์„ ๋ณต๊ตฌ
  • CTGAN(+Optuna)

    • ํด๋ž˜์Šค 0, 1์˜ ์ดˆ๊ธฐ ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ ์ˆ˜ ํ™•๋ณด๋ฅผ ์œ„ํ•ด SMOTENC๋ฅผ ์‚ฌ์šฉํ•œ 1์ฐจ ์ฆ๊ฐ•์„ ์ˆ˜ํ–‰ํ•œ ๋’ค, CTGAN์„ ์ ์šฉํ•˜์—ฌ ์ตœ์ข… ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์„ ์ง„ํ–‰ (<-> 151ํ–‰)
    • ํด๋ž˜์Šค 0, 1์„ ๋Œ€์ƒ์œผ๋กœ Optuna๋กœ embedding_dim, generator_dim, discriminator_dim, pac, batch_size, discriminator_steps ํƒ์ƒ‰ ํ›„ ํ•ฉ์„ฑ
    • ์ƒ์„ฑ ํ‘œ๋ณธ ํ’ˆ์งˆ ํ•„ํ„ฐ: class 0 โ†’ 0 โ‰ค visi < 100, class 1 โ†’ 100 โ‰ค visi < 500
    • ์ตœ์ข… ํ•ฉ๋ณธ ํ›„ ํŒŒ์ƒ/๋ณด์กฐ ํ”ผ์ฒ˜(binary_class, ์ฃผ๊ธฐ/์ฐจ๋ถ„ ํ•ญ๋ชฉ) ๋ณต๊ตฌ
  • ์‚ฐ์ถœ๋ฌผ

    • data/data_oversampled/smote/, ctgan7000/, ctgan10000/, ctgan20000/ ํ•˜์œ„์— ์ง€์—ญ๋ณ„ CSV ์ €์žฅ

๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜(์ƒ์„ธ)

  • ๋”ฅ๋Ÿฌ๋‹(tabular)

    • resnet_like.py
      • ์ž…๋ ฅ: x_num [B, N_num], x_cat [B, N_cat] โ†’ concat โ†’ ์ž…๋ ฅ์„ ํ˜•(d_main=128) โ†’ ์ž”์ฐจ๋ธ”๋ก(n_blocks=4, d_hidden=64, dropout_first=0.25) โ†’ ์ถœ๋ ฅ์ธต
      • ์ถœ๋ ฅ: num_classes == 2 โ†’ 1 ๋กœ์ง“, > 2 โ†’ K ๋กœ์ง“
    • ft_transformer.py
      • ์ˆ˜์น˜: Linear(num_features โ†’ d_token=192), ๋ฒ”์ฃผ: cat_cardinalities๋ณ„ nn.Embedding(d_token) ํ›„ ํ•ฉ์„ฑ
      • ์ธ์ฝ”๋”: TransformerEncoderLayer(d_model=d_token, nhead=8, dropoutโ‰ˆ0.2) ร— n_blocks=6 โ†’ ํ‰๊ท  ํ’€๋ง โ†’ ๋ถ„๋ฅ˜ ํ—ค๋“œ
    • deepgbm.py
      • ์ˆ˜์น˜ Linear(d_main=128) + ๋ฒ”์ฃผ ์ž„๋ฒ ๋”ฉ ํ•ฉ์‚ฐ โ†’ ์ž”์ฐจ MLP ๋ธ”๋ก(n_blocks=4, d_hidden=64, dropoutโ‰ˆ0.2) โ†’ ๋ถ„๋ฅ˜ ํ—ค๋“œ
  • GBDT

    • LightGBM(optima/LGB_smote_seoul.py): objective='multiclassova', n_estimatorsโ‰ˆ4000, ์กฐ๊ธฐ์ข…๋ฃŒ, GPU ์˜ต์…˜ ์˜ˆ์‹œ ์กด์žฌ, hyperopt๋กœ max_depth, min_child_weight, num_leaves, subsample, learning_rate ํƒ์ƒ‰
    • XGBoost(optima/XGB_smote_seoul.py): objective='multi:softprob', tree_method='hist', enable_categorical=True, GPU ์˜ต์…˜, hyperopt๋กœ ํ•ต์‹ฌ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ํƒ์ƒ‰, eval_metric=CSI

ํ•™์Šต/๊ฒ€์ฆ ์ „๋žต

  • ์—ฐ๋„ ๊ธฐ๋ฐ˜ ํ™€๋“œ์•„์›ƒ 3-Fold(์˜ˆ์‹œ)
    • Fold1: Train 2018โ€“2019 โ†’ Val 2020
    • Fold2: Train 2018โ€“2020 โ†’ Val 2019
    • Fold3: Train 2019โ€“2020 โ†’ Val 2018
  • ์ง€์—ญ ๋‹จ์œ„๋กœ ๋ณ„๋„ ํ•™์Šต(์˜ˆ: seoul_train.csv ๋“ฑ)

ํ‰๊ฐ€ ์ง€ํ‘œ

  • ์‚ฌ์šฉ์ž ์ •์˜ CSI(Critical Success Index) ๋‹ค์ค‘๋ถ„๋ฅ˜ ๋ฒ„์ „
H = cm[0, 0] + cm[1, 1]
F = (cm[1, 0] + cm[2, 0] + cm[0, 1] + cm[2, 1])
M = (cm[0, 2] + cm[1, 2])
CSI = H / (H + F + M + 1e-10)
  • ๊ทธ ์™ธ: ์ •ํ™•๋„, F1 ๋“ฑ ๋…ธํŠธ๋ถ/์Šคํฌ๋ฆฝํŠธ์—์„œ ๋ณ‘ํ–‰ ํ™•์ธ

์‹คํ–‰ ๋ฐฉ๋ฒ•(์ƒ์„ธ)

  • ํ™˜๊ฒฝ ์ค€๋น„

    • Python 3.8+ ๊ถŒ์žฅ, CUDA ์ง€์› ์‹œ GPU ์‚ฌ์šฉ ๊ฐ€๋Šฅ(CTGAN/GBDT ์†๋„ ํ–ฅ์ƒ)
    • LightGBM GPU๊ฐ€ ๋ฏธ์„ค์น˜๋ผ๋ฉด pip install lightgbm์œผ๋กœ CPU ๋ฒ„์ „ ์‚ฌ์šฉ ๋˜๋Š” GPU ๋นŒ๋“œ ํ•„์š”
  • ๋ฐ์ดํ„ฐ ์ค€๋น„

    • data/ASOS/: ์—ฐ๋„๋ณ„ ๊ธฐ์ƒ ์›์ฒœ
    • data/dataon/: ๋Œ€๊ธฐ์˜ค์—ผ ์ผ์ž๋ณ„ CSV(๋Œ€์šฉ๋Ÿ‰)
    • data/data_for_modeling/: ์ง€์—ญ๋ณ„ ํ•™์Šต/ํ‰๊ฐ€ ์„ธํŠธ(*_train.csv, *_test.csv, df_*.feather)
    • data/data_for_TAF/: ๊ณตํ•ญ๋ณ„ TAF(ํ•ญ๊ณต๊ธฐ์ƒ)
  • ์ „์ฒ˜๋ฆฌ/ํƒ์ƒ‰

    • Analysis_code/0.air_data_merge.ipynb โ†’ 1.data_merge.ipynb โ†’ 2.eda_preproccesing.ipynb
  • ์˜ค๋ฒ„์ƒ˜ํ”Œ๋ง

    • Analysis_code/make_oversample_data/์—์„œ ์Šคํฌ๋ฆฝํŠธ ์‹คํ–‰(์ƒ๋‹จ TL;DR ์ฐธ์กฐ)
  • GBDT ์ตœ์ ํ™”/ํ•™์Šต

    • Analysis_code/optima/LGB_smote_seoul.py, XGB_smote_seoul.py ์‹คํ–‰
    • ์‚ฐ์ถœ ๋ชจ๋ธ์€ Analysis_code/save_model/ ํ•˜์œ„์— .pkl๋กœ ์ €์žฅ
  • ๋”ฅ๋Ÿฌ๋‹ ํ•™์Šต

    • deeplearning_model_* ๋…ธํŠธ๋ถ์—์„œ ์ •ํ˜•๋ฐ์ดํ„ฐ ๋ชจ๋ธ ํ•™์Šต/ํ‰๊ฐ€, model_visualize.ipynb๋กœ ์‹œ๊ฐํ™”
  • ์•™์ƒ๋ธ”/์ตœ์ข… ํ‰๊ฐ€

    • model_voting_test_best_sample/ensemble__voting_best_sample.ipynb
    • final_test/final.ipynb

๋ชจ๋ธ ์ž…์ถœ๋ ฅ ๊ทœ๊ฒฉ(์š”์•ฝ)

  • ์ˆ˜์น˜ ์ž…๋ ฅ x_num: float32 ํ…์„œ [batch, num_numeric_features]
  • ๋ฒ”์ฃผ ์ž…๋ ฅ x_cat: ์ •์ˆ˜ ์ธ๋ฑ์Šค ํ…์„œ [batch, num_categorical_features]
  • ์ถœ๋ ฅ: ์ด์ง„(1 ๋กœ์ง“) ๋˜๋Š” ๋‹ค์ค‘๋ถ„๋ฅ˜(K ๋กœ์ง“). ์†์‹ค/์ž„๊ณ„๊ฐ’์€ ๋…ธํŠธ๋ถ ๋‚ด ์„ค์ • ์ฐธ๊ณ 

์žฌํ˜„์„ฑ/์‹œ๋“œ

  • random_state=42(SMOTENC), ๋ชจ๋ธ ์Šคํฌ๋ฆฝํŠธ ๋‚ด random_state=120 ๋“ฑ์˜ ๊ณ ์ •๊ฐ’ ์‚ฌ์šฉ
  • ๋ฐ์ดํ„ฐ/ํ•˜๋“œ์›จ์–ด ์ฐจ์ด์— ๋”ฐ๋ผ ์žฌํ˜„๋ฅ ์ด ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ fold/seed๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ์„ค์ • ๊ถŒ์žฅ

์ฃผ์˜/ํŠธ๋Ÿฌ๋ธ”์ŠˆํŒ…

  • optima/LGB_smote_seoul.py์˜ sys.path.append(...)๋Š” ํ™˜๊ฒฝ ์˜์กด์  ๊ฒฝ๋กœ์ž…๋‹ˆ๋‹ค. ์ผ๋ฐ˜ ํ™˜๊ฒฝ์—์„œ๋Š” ์ œ๊ฑฐํ•ด๋„ from lightgbm import LGBMClassifier๊ฐ€ ๋™์ž‘ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ์Šคํฌ๋ฆฝํŠธ๋Š” ์ƒ๋Œ€ ๊ฒฝ๋กœ๋ฅผ ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค. ์‹คํ–‰ ์ „ ํ˜„์žฌ ์ž‘์—… ๋””๋ ‰ํ„ฐ๋ฆฌ๊ฐ€ Analysis_code/* ํ•˜์œ„์ธ์ง€ ํ™•์ธํ•˜์„ธ์š”.
  • wind_dir์˜ '์ •์˜จ' ๊ฐ’ ์น˜ํ™˜/ํ˜•๋ณ€ํ™˜์ด ๋ˆ„๋ฝ๋˜๋ฉด GBDT/XGB์—์„œ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • dataon/์€ ๋งค์šฐ ๋Œ€์šฉ๋Ÿ‰์ž…๋‹ˆ๋‹ค. ๋ฉ”๋ชจ๋ฆฌ ์—ฌ์œ ๋ฅผ ํ™•๋ณดํ•˜๊ฑฐ๋‚˜ ์—ฐ๋„/์ง€์—ญ ๋‹จ์œ„๋กœ ์ฒ˜๋ฆฌํ•˜์„ธ์š”.

์˜์กด์„ฑ

  • Python 3.8+
  • PyTorch, pandas, numpy, scikit-learn, imbalanced-learn, optuna, ctgan, xgboost, lightgbm, joblib, matplotlib, seaborn

๋ผ์ด์„ ์Šค/์ธ์šฉ

  • ๋ผ์ด์„ ์Šค: ์ถ”ํ›„ ์—…๋ฐ์ดํŠธ ์˜ˆ์ •
  • ๋ณธ ํ”„๋กœ์ ํŠธ/๊ฒฐ๊ณผ๋ฌผ์„ ์ธ์šฉ ์‹œ visibility_prediction ์ €์žฅ์†Œ์™€ ์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ ์†Œ์Šค(ASOS, DataOn, TAF)๋ฅผ ๋ช…์‹œํ•ด ์ฃผ์„ธ์š”.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support