JonnyBP commited on
Commit
58e2c7a
·
1 Parent(s): b2f13a2

fix: new c=0.1 for training baseline and add custom_stopwords. #5

Browse files
configs/features.yaml CHANGED
@@ -8,6 +8,19 @@ preprocessing:
8
  lemmatize: true
9
  min_token_length: 2
10
  language: en
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  vectorization:
13
  method: tfidf # tfidf | bow | both
 
8
  lemmatize: true
9
  min_token_length: 2
10
  language: en
11
+ custom_stopwords:
12
+ - youtube
13
+ - video
14
+ - watch
15
+ - like
16
+ - comment
17
+ - channel
18
+ - stefan
19
+ - peggy
20
+ - masri
21
+ - ferguson
22
+ - cigar
23
+ - hubbard
24
 
25
  vectorization:
26
  method: tfidf # tfidf | bow | both
configs/models.yaml ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ models:
2
+ logistic_regression:
3
+ C: 0.1
4
+ max_iter: 1000
5
+ class_weight: balanced
6
+ solver: lbfgs
7
+
8
+ random_forest:
9
+ n_estimators: 100
10
+ max_depth: null
11
+ min_samples_split: 2
12
+ class_weight: balanced
13
+ n_jobs: -1
14
+
15
+ xgboost:
16
+ n_estimators: 100
17
+ max_depth: 6
18
+ learning_rate: 0.1
19
+ subsample: 0.8
20
+ colsample_bytree: 0.8
21
+ scale_pos_weight: 1
22
+
23
+ evaluation:
24
+ primary_metric: f1_weighted
25
+ metrics:
26
+ - accuracy
27
+ - f1_weighted
28
+ - precision_weighted
29
+ - recall_weighted
30
+ - roc_auc
notebooks/02_preprocessing_v2.ipynb CHANGED
The diff for this file is too large to render. See raw diff
 
notebooks/03_vectorization_v2.ipynb CHANGED
The diff for this file is too large to render. See raw diff
 
reports/v2/07_tokens_comparativa.png CHANGED

Git LFS Details

  • SHA256: ec0a6ea1ef17dca1a15d8ccd4146d6db93992e18116d74251e89f146c47d6b5d
  • Pointer size: 130 Bytes
  • Size of remote file: 66.1 kB

Git LFS Details

  • SHA256: 398bf94aac1d980c76d7ea2a3234718e87784e1b13a41915d891c2b7d6316e86
  • Pointer size: 130 Bytes
  • Size of remote file: 65.3 kB
reports/v2/08_tfidf_top_features.png CHANGED

Git LFS Details

  • SHA256: fee0436db93240ad29f5f6d513e1fd54c79f1586a5245433c88634ad7bca5927
  • Pointer size: 130 Bytes
  • Size of remote file: 92.8 kB

Git LFS Details

  • SHA256: 7c93688b0bee2bb59e729d835fed338855c9b624dab3be4459381af3bc17c83d
  • Pointer size: 130 Bytes
  • Size of remote file: 89.3 kB
reports/v2/09_cv_metrics.png ADDED

Git LFS Details

  • SHA256: 6c1ddcaeec73493964b5d293c9676cfb57b4a1e4cd7cc280aaacff9c3c93e956
  • Pointer size: 130 Bytes
  • Size of remote file: 62 kB
reports/v2/10_baseline_confusion_roc.png ADDED

Git LFS Details

  • SHA256: 59e019a9124b8ef2b4a9bdb6714425388130f8934ceca0b4a9055455f6674915
  • Pointer size: 130 Bytes
  • Size of remote file: 77.4 kB
reports/v2/11_lr_coeficientes.png ADDED

Git LFS Details

  • SHA256: 31143308b65a29e757c5c19264005a1376d5ed6741805395229280a1301f00c2
  • Pointer size: 130 Bytes
  • Size of remote file: 96.3 kB