license: mit
library_name: sklearn
tags:
- sklearn
- skops
- text-classification
model_format: pickle
model_file: model.pkl
Model description
[More Information Needed]
Intended uses & limitations
[More Information Needed]
Training Procedure
[More Information Needed]
Hyperparameters
Click to expand
| Hyperparameter | Value |
|---|---|
| memory | |
| steps | [('tfidf', TfidfVectorizer(min_df=100, ngram_range=(1, 3), preprocessor=<function preprocessor at 0x7fa438e7a280>)), ('classifier', XGBClassifier(base_score=None, booster=None, callbacks=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, device=None, early_stopping_rounds=None, enable_categorical=False, eval_metric=None, feature_types=None, gamma=None, grow_policy=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_bin=None, max_cat_threshold=None, max_cat_to_onehot=None, max_delta_step=None, max_depth=None, max_leaves=None, min_child_weight=None, missing=nan, monotone_constraints=None, multi_strategy=None, n_estimators=None, n_jobs=None, num_parallel_tree=None, random_state=None, ...))] |
| verbose | True |
| tfidf | TfidfVectorizer(min_df=100, ngram_range=(1, 3), preprocessor=<function preprocessor at 0x7fa438e7a280>) |
| classifier | XGBClassifier(base_score=None, booster=None, callbacks=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, device=None, early_stopping_rounds=None, enable_categorical=False, eval_metric=None, feature_types=None, gamma=None, grow_policy=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_bin=None, max_cat_threshold=None, max_cat_to_onehot=None, max_delta_step=None, max_depth=None, max_leaves=None, min_child_weight=None, missing=nan, monotone_constraints=None, multi_strategy=None, n_estimators=None, n_jobs=None, num_parallel_tree=None, random_state=None, ...) |
| tfidf__analyzer | word |
| tfidf__binary | False |
| tfidf__decode_error | strict |
| tfidf__dtype | <class 'numpy.float64'> |
| tfidf__encoding | utf-8 |
| tfidf__input | content |
| tfidf__lowercase | True |
| tfidf__max_df | 1.0 |
| tfidf__max_features | |
| tfidf__min_df | 100 |
| tfidf__ngram_range | (1, 3) |
| tfidf__norm | l2 |
| tfidf__preprocessor | <function preprocessor at 0x7fa438e7a280> |
| tfidf__smooth_idf | True |
| tfidf__stop_words | |
| tfidf__strip_accents | |
| tfidf__sublinear_tf | False |
| tfidf__token_pattern | (?u)\b\w\w+\b |
| tfidf__tokenizer | |
| tfidf__use_idf | True |
| tfidf__vocabulary | |
| classifier__objective | binary:logistic |
| classifier__base_score | |
| classifier__booster | |
| classifier__callbacks | |
| classifier__colsample_bylevel | |
| classifier__colsample_bynode | |
| classifier__colsample_bytree | |
| classifier__device | |
| classifier__early_stopping_rounds | |
| classifier__enable_categorical | False |
| classifier__eval_metric | |
| classifier__feature_types | |
| classifier__gamma | |
| classifier__grow_policy | |
| classifier__importance_type | |
| classifier__interaction_constraints | |
| classifier__learning_rate | |
| classifier__max_bin | |
| classifier__max_cat_threshold | |
| classifier__max_cat_to_onehot | |
| classifier__max_delta_step | |
| classifier__max_depth | |
| classifier__max_leaves | |
| classifier__min_child_weight | |
| classifier__missing | nan |
| classifier__monotone_constraints | |
| classifier__multi_strategy | |
| classifier__n_estimators | |
| classifier__n_jobs | |
| classifier__num_parallel_tree | |
| classifier__random_state | |
| classifier__reg_alpha | |
| classifier__reg_lambda | |
| classifier__sampling_method | |
| classifier__scale_pos_weight | |
| classifier__subsample | |
| classifier__tree_method | |
| classifier__validate_parameters | |
| classifier__verbosity |
Model Plot
Pipeline(steps=[('tfidf',TfidfVectorizer(min_df=100, ngram_range=(1, 3),preprocessor=<function preprocessor at 0x7fa438e7a280>)),('classifier',XGBClassifier(base_score=None, booster=None, callbacks=None,colsample_bylevel=None, colsample_bynode=None,colsample_bytree=None, device=None,early_stopping_rounds=None,enable_categorical=False, eval_metric=None,featur...importance_type=None,interaction_constraints=None, learning_rate=None,max_bin=None, max_cat_threshold=None,max_cat_to_onehot=None, max_delta_step=None,max_depth=None, max_leaves=None,min_child_weight=None, missing=nan,monotone_constraints=None, multi_strategy=None,n_estimators=None, n_jobs=None,num_parallel_tree=None, random_state=None, ...))],verbose=True)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('tfidf',TfidfVectorizer(min_df=100, ngram_range=(1, 3),preprocessor=<function preprocessor at 0x7fa438e7a280>)),('classifier',XGBClassifier(base_score=None, booster=None, callbacks=None,colsample_bylevel=None, colsample_bynode=None,colsample_bytree=None, device=None,early_stopping_rounds=None,enable_categorical=False, eval_metric=None,featur...importance_type=None,interaction_constraints=None, learning_rate=None,max_bin=None, max_cat_threshold=None,max_cat_to_onehot=None, max_delta_step=None,max_depth=None, max_leaves=None,min_child_weight=None, missing=nan,monotone_constraints=None, multi_strategy=None,n_estimators=None, n_jobs=None,num_parallel_tree=None, random_state=None, ...))],verbose=True)TfidfVectorizer(min_df=100, ngram_range=(1, 3),preprocessor=<function preprocessor at 0x7fa438e7a280>)
XGBClassifier(base_score=None, booster=None, callbacks=None,colsample_bylevel=None, colsample_bynode=None,colsample_bytree=None, device=None, early_stopping_rounds=None,enable_categorical=False, eval_metric=None, feature_types=None,gamma=None, grow_policy=None, importance_type=None,interaction_constraints=None, learning_rate=None, max_bin=None,max_cat_threshold=None, max_cat_to_onehot=None,max_delta_step=None, max_depth=None, max_leaves=None,min_child_weight=None, missing=nan, monotone_constraints=None,multi_strategy=None, n_estimators=None, n_jobs=None,num_parallel_tree=None, random_state=None, ...)
Evaluation Results
| Metric | Value |
|---|---|
| accuracy | 0.910317 |
| f1 score | 0.910317 |
How to Get Started with the Model
[More Information Needed]
Model Card Authors
This model card is written by following authors:
[More Information Needed]
Model Card Contact
You can contact the model card authors through following channels: [More Information Needed]
Citation
Below you can find information related to citation.
BibTeX:
[More Information Needed]
get_started_code
import sklearn import dill as pickle
from skops import hub_utils from pathlib import Path
suicide_detector_repo = Path("./suicide-detector")
hub_utils.download( repo_id="AndyJamesTurner/suicideDetector", dst=suicide_detector_repo )
with open(suicide_detector_repo / "model.pkl", 'rb') as file: clf = pickle.load(file)
classification = clf.predict(["I want to kill myself"])[0]
model_card_authors
Andy Turner
model_description
Suicide Detection text classification model.
Trained on the Suicide and Depression Detection dataset (https://www.kaggle.com/datasets/nikhileswarkomati/suicide-watch)
The model vectorises each text using a trained tfidf vectorizer and then classifies using xgboost.
eval_method
The model was evaluated on a 0.3 holdout split using f1 score, accuracy, confusion matrix and ROC curves.
Confusion matrix
ROC Curve
Classification Report
Click to expand
| index | precision | recall | f1-score | support |
|---|---|---|---|---|
| not suicide | 0.891721 | 0.934126 | 0.912431 | 34824 |
| suicide | 0.930785 | 0.886491 | 0.908098 | 34799 |
| macro avg | 0.911253 | 0.910308 | 0.910265 | 69623 |
| weighted avg | 0.911246 | 0.910317 | 0.910265 | 69623 |

