| | --- |
| | license: cc0-1.0 |
| | --- |
| | |
| | **Note:** Due to nature of toxic comments data and code contain explicit language. |
| |
|
| | Data is from kaggle, the *Toxic Comment Classification Challenge* |
| | <br> |
| | https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge/data?select=train.csv.zip |
| |
|
| | A copy of the data exists on the `data` directory. |
| |
|
| | Trained over 20 epoch in a runpod |
| |
|
| | ### 🤗 Running demo here: |
| | https://huggingface.co/spaces/vluz/Tox |
| |
|
| | <hr> |
| |
|
| | Code requires pandas, tensorflow, and streamlit. All can be installed via `pip`. |
| |
|
| | ```python |
| | import os |
| | import pickle |
| | import streamlit as st |
| | import tensorflow as tf |
| | from tensorflow.keras.layers import TextVectorization |
| | |
| | |
| | @st.cache_resource |
| | def load_model(): |
| | model = tf.keras.models.load_model(os.path.join("model", "toxmodel.keras")) |
| | return model |
| | |
| | |
| | @st.cache_resource |
| | def load_vectorizer(): |
| | from_disk = pickle.load(open(os.path.join("model", "vectorizer.pkl"), "rb")) |
| | new_v = TextVectorization.from_config(from_disk['config']) |
| | new_v.adapt(tf.data.Dataset.from_tensor_slices(["xyz"])) # fix for Keras bug |
| | new_v.set_weights(from_disk['weights']) |
| | return new_v |
| | |
| | |
| | st.title("Toxic Comment Test") |
| | st.divider() |
| | model = load_model() |
| | vectorizer = load_vectorizer() |
| | default_prompt = "i love you man, but fuck you!" |
| | input_text = st.text_area("Comment:", default_prompt, height=150).lower() |
| | if st.button("Test"): |
| | if not input_text: |
| | st.write("⚠ Warning: Empty prompt.") |
| | elif len(input_text) < 15: |
| | st.write("⚠ Warning: Model is far less accurate with a small prompt.") |
| | if input_text == default_prompt: |
| | st.write("Expected results from default prompt are positive for 0 and 2") |
| | with st.spinner("Testing..."): |
| | inputv = vectorizer([input_text]) |
| | output = model.predict(inputv) |
| | res = (output > 0.5) |
| | st.write(["toxic","severe toxic","obscene","threat","insult","identity hate"], res) |
| | st.write(output) |
| | ``` |
| |
|
| |
|
| | Put `toxmodel.keras` and `vectorizer.pkl` into the `model` dir. |
| |
|
| | Then do: |
| | ``` |
| | stramlit run toxtest.py |
| | ``` |
| |
|
| | Expected result from default prompt is 0 and 2 |
| |
|
| | <hr> |
| |
|
| | Full code can be found here: |
| | <br> |
| | https://github.com/vluz/ToxTest/ |
| | <br> |