suicideDetector / README.md

AndyJamesTurner

Initial commit

fcbc617 verified almost 2 years ago

preview code

raw

history blame

25 kB

metadata

license: mit
library_name: sklearn
tags:
  - sklearn
  - skops
  - text-classification
model_format: pickle
model_file: model.pkl

Model description

[More Information Needed]

Intended uses & limitations

[More Information Needed]

Training Procedure

[More Information Needed]

Hyperparameters

Click to expand

Hyperparameter	Value
memory
steps	[('tfidf', TfidfVectorizer(min_df=100, ngram_range=(1, 3), preprocessor=<function preprocessor at 0x7fa438e7a280>)), ('classifier', XGBClassifier(base_score=None, booster=None, callbacks=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, device=None, early_stopping_rounds=None, enable_categorical=False, eval_metric=None, feature_types=None, gamma=None, grow_policy=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_bin=None, max_cat_threshold=None, max_cat_to_onehot=None, max_delta_step=None, max_depth=None, max_leaves=None, min_child_weight=None, missing=nan, monotone_constraints=None, multi_strategy=None, n_estimators=None, n_jobs=None, num_parallel_tree=None, random_state=None, ...))]
verbose	True
tfidf	TfidfVectorizer(min_df=100, ngram_range=(1, 3), preprocessor=<function preprocessor at 0x7fa438e7a280>)
classifier	XGBClassifier(base_score=None, booster=None, callbacks=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, device=None, early_stopping_rounds=None, enable_categorical=False, eval_metric=None, feature_types=None, gamma=None, grow_policy=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_bin=None, max_cat_threshold=None, max_cat_to_onehot=None, max_delta_step=None, max_depth=None, max_leaves=None, min_child_weight=None, missing=nan, monotone_constraints=None, multi_strategy=None, n_estimators=None, n_jobs=None, num_parallel_tree=None, random_state=None, ...)
tfidf__analyzer	word
tfidf__binary	False
tfidf__decode_error	strict
tfidf__dtype	<class 'numpy.float64'>
tfidf__encoding	utf-8
tfidf__input	content
tfidf__lowercase	True
tfidf__max_df	1.0
tfidf__max_features
tfidf__min_df	100
tfidf__ngram_range	(1, 3)
tfidf__norm	l2
tfidf__preprocessor	<function preprocessor at 0x7fa438e7a280>
tfidf__smooth_idf	True
tfidf__stop_words
tfidf__strip_accents
tfidf__sublinear_tf	False
tfidf__token_pattern	(?u)\b\w\w+\b
tfidf__tokenizer
tfidf__use_idf	True
tfidf__vocabulary
classifier__objective	binary:logistic
classifier__base_score
classifier__booster
classifier__callbacks
classifier__colsample_bylevel
classifier__colsample_bynode
classifier__colsample_bytree
classifier__device
classifier__early_stopping_rounds
classifier__enable_categorical	False
classifier__eval_metric
classifier__feature_types
classifier__gamma
classifier__grow_policy
classifier__importance_type
classifier__interaction_constraints
classifier__learning_rate
classifier__max_bin
classifier__max_cat_threshold
classifier__max_cat_to_onehot
classifier__max_delta_step
classifier__max_depth
classifier__max_leaves
classifier__min_child_weight
classifier__missing	nan
classifier__monotone_constraints
classifier__multi_strategy
classifier__n_estimators
classifier__n_jobs
classifier__num_parallel_tree
classifier__random_state
classifier__reg_alpha
classifier__reg_lambda
classifier__sampling_method
classifier__scale_pos_weight
classifier__subsample
classifier__tree_method
classifier__validate_parameters
classifier__verbosity

Model Plot

Pipeline(steps=[('tfidf',TfidfVectorizer(min_df=100, ngram_range=(1, 3),preprocessor=<function preprocessor at 0x7fa438e7a280>)),('classifier',XGBClassifier(base_score=None, booster=None, callbacks=None,colsample_bylevel=None, colsample_bynode=None,colsample_bytree=None, device=None,early_stopping_rounds=None,enable_categorical=False, eval_metric=None,featur...importance_type=None,interaction_constraints=None, learning_rate=None,max_bin=None, max_cat_threshold=None,max_cat_to_onehot=None, max_delta_step=None,max_depth=None, max_leaves=None,min_child_weight=None, missing=nan,monotone_constraints=None, multi_strategy=None,n_estimators=None, n_jobs=None,num_parallel_tree=None, random_state=None, ...))],verbose=True)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Evaluation Results

Metric	Value
accuracy	0.910317
f1 score	0.910317

How to Get Started with the Model

[More Information Needed]

Model Card Authors

This model card is written by following authors:

[More Information Needed]

Model Card Contact

You can contact the model card authors through following channels: [More Information Needed]

Citation

Below you can find information related to citation.

BibTeX:

[More Information Needed]

get_started_code

import sklearn import dill as pickle

from skops import hub_utils from pathlib import Path

suicide_detector_repo = Path("./suicide-detector")

hub_utils.download( repo_id="AndyJamesTurner/suicideDetector", dst=suicide_detector_repo )

with open(suicide_detector_repo / "model.pkl", 'rb') as file: clf = pickle.load(file)

classification = clf.predict(["I want to kill myself"])[0]

model_card_authors

Andy Turner

model_description

Suicide Detection text classification model.

Trained on the Suicide and Depression Detection dataset (https://www.kaggle.com/datasets/nikhileswarkomati/suicide-watch)

The model vectorises each text using a trained tfidf vectorizer and then classifies using xgboost.

eval_method

The model was evaluated on a 0.3 holdout split using f1 score, accuracy, confusion matrix and ROC curves.

Confusion matrix

ROC Curve

Classification Report

Click to expand

index	precision	recall	f1-score	support
not suicide	0.891721	0.934126	0.912431	34824
suicide	0.930785	0.886491	0.908098	34799
macro avg	0.911253	0.910308	0.910265	69623
weighted avg	0.911246	0.910317	0.910265	69623