suicideDetector / README.md
AndyJamesTurner's picture
Initial commit
fcbc617 verified
|
raw
history blame
25 kB
metadata
license: mit
library_name: sklearn
tags:
  - sklearn
  - skops
  - text-classification
model_format: pickle
model_file: model.pkl

Model description

[More Information Needed]

Intended uses & limitations

[More Information Needed]

Training Procedure

[More Information Needed]

Hyperparameters

Click to expand
Hyperparameter Value
memory
steps [('tfidf', TfidfVectorizer(min_df=100, ngram_range=(1, 3),
preprocessor=<function preprocessor at 0x7fa438e7a280>)), ('classifier', XGBClassifier(base_score=None, booster=None, callbacks=None,
colsample_bylevel=None, colsample_bynode=None,
colsample_bytree=None, device=None, early_stopping_rounds=None,
enable_categorical=False, eval_metric=None, feature_types=None,
gamma=None, grow_policy=None, importance_type=None,
interaction_constraints=None, learning_rate=None, max_bin=None,
max_cat_threshold=None, max_cat_to_onehot=None,
max_delta_step=None, max_depth=None, max_leaves=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
multi_strategy=None, n_estimators=None, n_jobs=None,
num_parallel_tree=None, random_state=None, ...))]
verbose True
tfidf TfidfVectorizer(min_df=100, ngram_range=(1, 3),
preprocessor=<function preprocessor at 0x7fa438e7a280>)
classifier XGBClassifier(base_score=None, booster=None, callbacks=None,
colsample_bylevel=None, colsample_bynode=None,
colsample_bytree=None, device=None, early_stopping_rounds=None,
enable_categorical=False, eval_metric=None, feature_types=None,
gamma=None, grow_policy=None, importance_type=None,
interaction_constraints=None, learning_rate=None, max_bin=None,
max_cat_threshold=None, max_cat_to_onehot=None,
max_delta_step=None, max_depth=None, max_leaves=None,
min_child_weight=None, missing=nan, monotone_constraints=None,
multi_strategy=None, n_estimators=None, n_jobs=None,
num_parallel_tree=None, random_state=None, ...)
tfidf__analyzer word
tfidf__binary False
tfidf__decode_error strict
tfidf__dtype <class 'numpy.float64'>
tfidf__encoding utf-8
tfidf__input content
tfidf__lowercase True
tfidf__max_df 1.0
tfidf__max_features
tfidf__min_df 100
tfidf__ngram_range (1, 3)
tfidf__norm l2
tfidf__preprocessor <function preprocessor at 0x7fa438e7a280>
tfidf__smooth_idf True
tfidf__stop_words
tfidf__strip_accents
tfidf__sublinear_tf False
tfidf__token_pattern (?u)\b\w\w+\b
tfidf__tokenizer
tfidf__use_idf True
tfidf__vocabulary
classifier__objective binary:logistic
classifier__base_score
classifier__booster
classifier__callbacks
classifier__colsample_bylevel
classifier__colsample_bynode
classifier__colsample_bytree
classifier__device
classifier__early_stopping_rounds
classifier__enable_categorical False
classifier__eval_metric
classifier__feature_types
classifier__gamma
classifier__grow_policy
classifier__importance_type
classifier__interaction_constraints
classifier__learning_rate
classifier__max_bin
classifier__max_cat_threshold
classifier__max_cat_to_onehot
classifier__max_delta_step
classifier__max_depth
classifier__max_leaves
classifier__min_child_weight
classifier__missing nan
classifier__monotone_constraints
classifier__multi_strategy
classifier__n_estimators
classifier__n_jobs
classifier__num_parallel_tree
classifier__random_state
classifier__reg_alpha
classifier__reg_lambda
classifier__sampling_method
classifier__scale_pos_weight
classifier__subsample
classifier__tree_method
classifier__validate_parameters
classifier__verbosity

Model Plot

Pipeline(steps=[('tfidf',TfidfVectorizer(min_df=100, ngram_range=(1, 3),preprocessor=<function preprocessor at 0x7fa438e7a280>)),('classifier',XGBClassifier(base_score=None, booster=None, callbacks=None,colsample_bylevel=None, colsample_bynode=None,colsample_bytree=None, device=None,early_stopping_rounds=None,enable_categorical=False, eval_metric=None,featur...importance_type=None,interaction_constraints=None, learning_rate=None,max_bin=None, max_cat_threshold=None,max_cat_to_onehot=None, max_delta_step=None,max_depth=None, max_leaves=None,min_child_weight=None, missing=nan,monotone_constraints=None, multi_strategy=None,n_estimators=None, n_jobs=None,num_parallel_tree=None, random_state=None, ...))],verbose=True)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Evaluation Results

Metric Value
accuracy 0.910317
f1 score 0.910317

How to Get Started with the Model

[More Information Needed]

Model Card Authors

This model card is written by following authors:

[More Information Needed]

Model Card Contact

You can contact the model card authors through following channels: [More Information Needed]

Citation

Below you can find information related to citation.

BibTeX:

[More Information Needed]

get_started_code

import sklearn import dill as pickle

from skops import hub_utils from pathlib import Path

suicide_detector_repo = Path("./suicide-detector")

hub_utils.download( repo_id="AndyJamesTurner/suicideDetector", dst=suicide_detector_repo )

with open(suicide_detector_repo / "model.pkl", 'rb') as file: clf = pickle.load(file)

classification = clf.predict(["I want to kill myself"])[0]

model_card_authors

Andy Turner

model_description

Suicide Detection text classification model.

Trained on the Suicide and Depression Detection dataset (https://www.kaggle.com/datasets/nikhileswarkomati/suicide-watch)

The model vectorises each text using a trained tfidf vectorizer and then classifies using xgboost.

eval_method

The model was evaluated on a 0.3 holdout split using f1 score, accuracy, confusion matrix and ROC curves.

Confusion matrix

Confusion matrix

ROC Curve

ROC Curve

Classification Report

Click to expand
index precision recall f1-score support
not suicide 0.891721 0.934126 0.912431 34824
suicide 0.930785 0.886491 0.908098 34799
macro avg 0.911253 0.910308 0.910265 69623
weighted avg 0.911246 0.910317 0.910265 69623