PradAgrawal commited on
Commit
68635d7
·
verified ·
1 Parent(s): 6557ad0

Upload 4 files

Browse files
Files changed (4) hide show
  1. README.md +66 -11
  2. app.py +232 -0
  3. coaching_voices.json +128 -0
  4. requirements.txt +11 -0
README.md CHANGED
@@ -1,14 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: NeuroShieldApp
3
- emoji: 👀
4
- colorFrom: pink
5
- colorTo: pink
6
- sdk: streamlit
7
- sdk_version: 1.44.1
8
- app_file: app.py
9
- pinned: false
10
- license: mit
11
- short_description: An AI App for MultiModal Moderation & Rewrite Coaching
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🛡️ NeuroShield PoC (Enhanced Edition)
2
+
3
+ A powerful AI-based moderation assistant built with Streamlit, Hugging Face Transformers, and Groq API. Designed for nuanced, voice-guided responses to online toxicity.
4
+
5
+ ---
6
+
7
+ ## 🚀 Features
8
+
9
+ - ✅ **14-label toxicity classification** (simulated Jigsaw + extended logic)
10
+ - 🧠 **Coaching voice personas** (choose tone: compassionate, assertive, reflective, etc.)
11
+ - 🔥 **Visual indicators** (emoji SAFE/UNSAFE + toxicity heatmap)
12
+ - 🎚️ **Tolerance control** for each toxicity category
13
+ - 🧒 **Kids Mode** and **NSFW Filters**
14
+ - ✍️ **Groq LLM Rewrites** in selected tone/strategy
15
+
16
+ ---
17
+
18
+ ## 📦 Files Included
19
+
20
+ - `app.py` — Streamlit frontend and logic
21
+ - `requirements.txt` — Python dependencies
22
+ - `coaching_voices.json` — Tone-guided response schema
23
+
24
  ---
25
+
26
+ ## 🧠 Coaching Voice Selector
27
+
28
+ This system uses customizable tones like:
29
+ - The Boundary Setter
30
+ - The Mirror
31
+ - The Compassionate Reframer
32
+ - The Challenger
33
+ Add more in `coaching_voices.json`
34
+
35
  ---
36
 
37
+ ## 💻 Local Setup
38
+
39
+ ```bash
40
+ pip install -r requirements.txt
41
+ streamlit run app.py
42
+ ```
43
+
44
+ ---
45
+
46
+ ## 🧠 Deployment on Hugging Face Spaces
47
+
48
+ 1. Create a new Space (Python + Streamlit)
49
+ 2. Upload:
50
+ - `app.py`
51
+ - `requirements.txt`
52
+ - `coaching_voices.json`
53
+ 3. Add `GROQ_API_KEY` in **Secrets** (Settings → Repository secrets)
54
+
55
+ ---
56
+
57
+ ## 🔐 Secrets Configuration
58
+
59
+ Add the following in Hugging Face Spaces under `Repository secrets`:
60
+
61
+ ```
62
+ GROQ_API_KEY=your-groq-api-key
63
+ ```
64
+
65
+ ---
66
+
67
+ ## 🌐 License
68
+
69
+ MIT © 2025 — Built for research, teaching, and safe digital conversation.
app.py ADDED
@@ -0,0 +1,232 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import os
3
+ import time
4
+ import torch
5
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
6
+ from groq import Groq
7
+
8
+ # --------------------------------------------------------------------------
9
+ # Configuration & Model Loading (Cached for efficiency)
10
+ # --------------------------------------------------------------------------
11
+ CLASSIFIER_MODEL_NAME = "unitary/toxic-bert"
12
+ LLM_MODEL_GROQ = "llama3-8b-8192" # Or mixtral-8x7b-32768
13
+
14
+ st.set_page_config(page_title="NeuroShield PoC", layout="wide")
15
+
16
+ # Use Streamlit's caching for expensive operations like model loading
17
+ @st.cache_resource
18
+ def load_classifier_model():
19
+ """Loads the classifier model and tokenizer."""
20
+ print("Loading classifier model and tokenizer...")
21
+ try:
22
+ tokenizer = AutoTokenizer.from_pretrained(CLASSIFIER_MODEL_NAME)
23
+ model = AutoModelForSequenceClassification.from_pretrained(CLASSIFIER_MODEL_NAME)
24
+ # Determine device (use CPU on free HF Spaces usually, unless GPU assigned)
25
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
26
+ model.to(device)
27
+ model.eval()
28
+ print(f"Classifier model loaded on {device}.")
29
+ # Get labels from model config
30
+ model_labels = [model.config.id2label[i] for i in range(model.config.num_labels)]
31
+ return tokenizer, model, device, model_labels
32
+ except Exception as e:
33
+ st.error(f"Error loading classifier model: {e}")
34
+ print(f"Error loading classifier model: {e}")
35
+ return None, None, None, []
36
+
37
+ @st.cache_resource
38
+ def initialize_groq_client():
39
+ """Initializes the Groq client using API key from secrets."""
40
+ print("Initializing Groq client...")
41
+ try:
42
+ # Use st.secrets for Streamlit Community Cloud or os.environ for HF Spaces
43
+ groq_api_key = os.environ.get('GROQ_API_KEY')
44
+ if not groq_api_key:
45
+ # Fallback for local testing if using secrets.toml
46
+ try:
47
+ groq_api_key = st.secrets["GROQ_API_KEY"]
48
+ except Exception:
49
+ st.warning("GROQ_API_KEY not found in environment variables or st.secrets.")
50
+ return None
51
+
52
+ if not groq_api_key:
53
+ st.warning("Groq API Key not configured.")
54
+ return None
55
+ else:
56
+ client = Groq(api_key=groq_api_key)
57
+ print("Groq client initialized.")
58
+ return client
59
+ except Exception as e:
60
+ st.error(f"Error initializing Groq client: {e}")
61
+ print(f"Error initializing Groq client: {e}")
62
+ return None
63
+
64
+ # --- Load models and clients ---
65
+ tokenizer, model, device, model_labels = load_classifier_model()
66
+ groq_client = initialize_groq_client()
67
+
68
+ # --------------------------------------------------------------------------
69
+ # Core Logic Functions
70
+ # --------------------------------------------------------------------------
71
+ def classify_text(text, threshold=0.5):
72
+ """Classifies input text using the loaded multi-label model."""
73
+ if model is None or tokenizer is None or device is None or not model_labels:
74
+ st.error("Classifier model/tokenizer not loaded properly.")
75
+ return None
76
+
77
+ start_time = time.time()
78
+ try:
79
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
80
+ inputs = {k: v.to(device) for k, v in inputs.items()}
81
+
82
+ with torch.no_grad():
83
+ outputs = model(**inputs)
84
+
85
+ probabilities = torch.sigmoid(outputs.logits).squeeze().cpu().numpy()
86
+ results = {}
87
+ for i, label in enumerate(model_labels):
88
+ if i < len(probabilities):
89
+ prob = probabilities[i]
90
+ if prob > threshold:
91
+ results[label] = round(float(prob), 4)
92
+ else:
93
+ print(f"Warning: Index {i} out of bounds for probabilities")
94
+
95
+ end_time = time.time()
96
+ print(f"Classification took {end_time - start_time:.4f} seconds.")
97
+ return results
98
+
99
+ except Exception as e:
100
+ st.error(f"An error occurred during classification: {e}")
101
+ print(f"An error occurred during classification: {e}")
102
+ return None
103
+
104
+
105
+ def rewrite_text_groq(original_text, detected_labels_dict, persona="helpful assistant", tone="neutral"):
106
+ """Rewrites the input text using the Groq API."""
107
+ if not groq_client:
108
+ st.error("Groq client not initialized. Cannot perform rewrite.")
109
+ return "Error: Groq client not initialized."
110
+
111
+ # Construct the prompt (same logic as before)
112
+ if not detected_labels_dict:
113
+ detected_labels_list_str = "None relevant"
114
+ prompt_template = f"""You are a {persona}. A user wrote: "{original_text}"
115
+
116
+ Rewrite the message in a {tone} tone while keeping its essential meaning intact. Since no specific problematic categories were flagged, focus on ensuring the tone is appropriate and constructive."""
117
+ else:
118
+ detected_labels_list_str = ", ".join(detected_labels_dict.keys())
119
+ prompt_template = f"""You are a {persona}. A user wrote: "{original_text}"
120
+
121
+ Rewrite the message in a {tone} tone while keeping its essential meaning intact.
122
+
123
+ Explain briefly why the original might be perceived as unsafe or negative, focusing on the potential impact rather than just listing labels.
124
+
125
+ Ensure the rewritten message does NOT contain content related to the following categories: {detected_labels_list_str}. The goal is a safer, constructive alternative."""
126
+
127
+ print("\n--- Sending Request to Groq ---")
128
+ print(f"Model: {LLM_MODEL_GROQ}")
129
+ # print(f"Prompt:\n{prompt_template}\n" + "-"*20) # Avoid printing long prompts in logs
130
+
131
+ start_time = time.time()
132
+ try:
133
+ chat_completion = groq_client.chat.completions.create(
134
+ messages=[{"role": "user", "content": prompt_template}],
135
+ model=LLM_MODEL_GROQ,
136
+ temperature=0.6,
137
+ max_tokens=350, # Increased slightly
138
+ )
139
+ end_time = time.time()
140
+ print(f"Groq response received in {end_time - start_time:.2f} seconds.")
141
+ rewritten_content = chat_completion.choices[0].message.content.strip()
142
+ return rewritten_content
143
+
144
+ except Exception as e:
145
+ st.error(f"Error interacting with Groq: {e}")
146
+ print(f"Error interacting with Groq: {e}")
147
+ return f"Error: Failed to get rewrite from Groq. {e}"
148
+
149
+
150
+ def moderation_pipeline(input_text, classification_threshold=0.5):
151
+ """Runs the full classification and rewrite pipeline."""
152
+ print(f"\n--- Running Streamlit Pipeline for input ---")
153
+ pipeline_results = {
154
+ "original_text": input_text,
155
+ "detected_labels": {},
156
+ "rewrite_attempt": "(Not Attempted)",
157
+ "error": None
158
+ }
159
+
160
+ # 1. Classification
161
+ class_results = classify_text(input_text, threshold=classification_threshold)
162
+ if class_results is None:
163
+ pipeline_results["error"] = "Classification failed. Check logs."
164
+ return pipeline_results
165
+ pipeline_results["detected_labels"] = class_results
166
+ print(f"Classification Results: {class_results if class_results else 'None above threshold'}")
167
+
168
+ # 2. Rewrite (using Groq)
169
+ rewrite = rewrite_text_groq(input_text, class_results, persona="content moderator", tone="neutral and constructive")
170
+ pipeline_results["rewrite_attempt"] = rewrite
171
+
172
+ print("--- Pipeline Finished ---")
173
+ return pipeline_results
174
+
175
+ # --------------------------------------------------------------------------
176
+ # Streamlit UI Layout
177
+ # --------------------------------------------------------------------------
178
+
179
+ st.title("NeuroShield Proof-of-Concept")
180
+ st.markdown("A demonstration using a pre-trained toxicity classifier (`unitary/toxic-bert`) and an LLM rewrite suggestion via Groq API (`llama3-8b`). Enter text below and click 'Moderate'.")
181
+ st.markdown("---") # Separator
182
+
183
+ # Initialize session state to hold results
184
+ if 'pipeline_results' not in st.session_state:
185
+ st.session_state.pipeline_results = None
186
+
187
+ # Input Text Area
188
+ user_input = st.text_area("Enter text to moderate:", height=100, key="user_input_area")
189
+
190
+ # Moderate Button
191
+ if st.button("Moderate Text", key="moderate_button"):
192
+ if user_input:
193
+ # Show a spinner while processing
194
+ with st.spinner("Moderating..."):
195
+ # Check if prerequisites are loaded
196
+ if model and tokenizer and groq_client:
197
+ results = moderation_pipeline(user_input)
198
+ st.session_state.pipeline_results = results # Store results in session state
199
+ else:
200
+ st.error("Models or API client failed to load. Cannot moderate.")
201
+ st.session_state.pipeline_results = {"error": "Models or API client failed to load."}
202
+ else:
203
+ st.warning("Please enter some text to moderate.")
204
+ st.session_state.pipeline_results = None # Clear results if input is empty
205
+
206
+ # Display Results (using columns for better layout)
207
+ if st.session_state.pipeline_results:
208
+ results = st.session_state.pipeline_results
209
+ st.markdown("---") # Separator
210
+ st.subheader("Moderation Results")
211
+
212
+ col1, col2 = st.columns(2)
213
+
214
+ with col1:
215
+ st.metric(label="Input Text Status", value="Processed")
216
+ st.markdown("**Detected Labels & Scores**")
217
+ if results.get("error"):
218
+ st.error(f"Pipeline Error: {results['error']}")
219
+ elif results.get("detected_labels"):
220
+ st.json(results["detected_labels"])
221
+ else:
222
+ st.success("No problematic labels detected above threshold.")
223
+
224
+ with col2:
225
+ st.markdown("**Rewrite Suggestion**")
226
+ rewrite_text = results.get("rewrite_attempt", "Rewrite not generated.")
227
+ # Use a text area to display the rewrite, making it copyable
228
+ st.text_area("Suggested Rewrite:", value=rewrite_text, height=250, disabled=True, key="rewrite_output_area")
229
+
230
+ # Optional: Add footer or more info
231
+ st.markdown("---")
232
+ st.caption("Powered by Hugging Face Transformers and Groq API.")
coaching_voices.json ADDED
@@ -0,0 +1,128 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "voice_id": "boundary_setter",
4
+ "name": "The Boundary Setter",
5
+ "tone": "firm_respectful",
6
+ "response_strategy": "Name the behavior, assert limit, disengage",
7
+ "emotional_attitude": "assertive",
8
+ "communication_goal": "psychological safety, clear limits",
9
+ "example_response": "That comment crosses a line. I\u2019m not okay with this tone, and I won\u2019t engage further unless we can have a respectful conversation.",
10
+ "response_templates": [
11
+ "I hear what you said, and I want to be clear that [boundary]. I\u2019m stepping away from this.",
12
+ "Let\u2019s pause here. I won\u2019t engage in conversations that feel [emotionally unsafe/disrespectful].",
13
+ "This doesn\u2019t work for me. We can continue only if we shift the tone."
14
+ ],
15
+ "keywords_triggered_by": [
16
+ "stop",
17
+ "enough",
18
+ "crossed a line",
19
+ "disrespect",
20
+ "tone"
21
+ ],
22
+ "usage_contexts": [
23
+ "harassment",
24
+ "hate",
25
+ "violence"
26
+ ],
27
+ "applicable_toxicity_categories": [
28
+ "harassment",
29
+ "harassment threatening",
30
+ "hate",
31
+ "violence"
32
+ ],
33
+ "default_response_length": "short",
34
+ "escalation_sensitivity": 0.85,
35
+ "persona_notes": "Use when asserting boundaries is more important than reconciliation."
36
+ },
37
+ {
38
+ "voice_id": "mirror",
39
+ "name": "The Mirror",
40
+ "tone": "calm_reflective",
41
+ "response_strategy": "Restate the toxic statement in neutral terms to expose its nature",
42
+ "emotional_attitude": "dispassionate",
43
+ "communication_goal": "de-escalation and reflection",
44
+ "example_response": "You\u2019re saying I\u2019m stupid\u2014can you help me understand what you hoped that would accomplish?",
45
+ "response_templates": [
46
+ "You said '[quote]'. I\u2019m curious\u2014what were you hoping to achieve with that?",
47
+ "Let\u2019s look at what was just said: '[quote]'. That\u2019s worth reflecting on."
48
+ ],
49
+ "keywords_triggered_by": [
50
+ "idiot",
51
+ "stupid",
52
+ "dumb"
53
+ ],
54
+ "usage_contexts": [
55
+ "gaslighting",
56
+ "trolling",
57
+ "conflict"
58
+ ],
59
+ "applicable_toxicity_categories": [
60
+ "harassment",
61
+ "insult",
62
+ "hate"
63
+ ],
64
+ "default_response_length": "medium",
65
+ "escalation_sensitivity": 0.5,
66
+ "persona_notes": "Useful for showing people their behavior without adding emotional fuel."
67
+ },
68
+ {
69
+ "voice_id": "compassionate_reframer",
70
+ "name": "The Compassionate Reframer",
71
+ "tone": "gentle",
72
+ "response_strategy": "Acknowledge pain, redirect energy, invite empathy",
73
+ "emotional_attitude": "empathetic",
74
+ "communication_goal": "emotional repair and reconnection",
75
+ "example_response": "I can hear there\u2019s frustration behind your words. Maybe there\u2019s a better way to talk about what\u2019s bothering you?",
76
+ "response_templates": [
77
+ "Sounds like you\u2019re upset. Want to tell me what\u2019s really going on?",
78
+ "That felt harsh\u2014want to try again in a way that helps us understand each other?"
79
+ ],
80
+ "keywords_triggered_by": [
81
+ "shut up",
82
+ "annoying",
83
+ "angry"
84
+ ],
85
+ "usage_contexts": [
86
+ "emotional conflict",
87
+ "relational tension"
88
+ ],
89
+ "applicable_toxicity_categories": [
90
+ "harassment",
91
+ "insult",
92
+ "self harm intent"
93
+ ],
94
+ "default_response_length": "medium",
95
+ "escalation_sensitivity": 0.4,
96
+ "persona_notes": "For people who prefer to meet aggression with care and redirect the conversation."
97
+ },
98
+ {
99
+ "voice_id": "challenger",
100
+ "name": "The Challenger",
101
+ "tone": "bold",
102
+ "response_strategy": "Call out bad behavior directly, use logic or ethics",
103
+ "emotional_attitude": "provocative",
104
+ "communication_goal": "confrontation and accountability",
105
+ "example_response": "If you believe that\u2019s okay to say, let\u2019s examine that. What if someone said that to someone you care about?",
106
+ "response_templates": [
107
+ "That sounds wrong\u2014why do you believe that\u2019s acceptable?",
108
+ "Let\u2019s be honest: would you say that to someone in person?"
109
+ ],
110
+ "keywords_triggered_by": [
111
+ "you people",
112
+ "always",
113
+ "never"
114
+ ],
115
+ "usage_contexts": [
116
+ "hate",
117
+ "bullying"
118
+ ],
119
+ "applicable_toxicity_categories": [
120
+ "hate",
121
+ "hate instructions",
122
+ "violence"
123
+ ],
124
+ "default_response_length": "medium",
125
+ "escalation_sensitivity": 0.7,
126
+ "persona_notes": "Use when users want to stand their ground while staying thoughtful."
127
+ }
128
+ ]
requirements.txt ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ transformers
2
+ torch
3
+ accelerate
4
+ # ipywidgets is usually not needed for streamlit deployment
5
+ streamlit
6
+ groq
7
+ # Pin versions if needed for stability, e.g.:
8
+ # streamlit==1.32.0
9
+ # transformers==4.38.0
10
+ # torch==2.1.0 # Check compatibility with HF Spaces hardware/CUDA if needed
11
+ # groq==0.5.0