API DESCRIPTION

What does this API do?

The Moderation API evaluates prompts to determine whether they are safe for use with a large language model (LLM). It returns a status (passed or failed) with detailed scores.

Moderation Checks

Prompt Injection: Detects attempts to hijack or manipulate the LLM behavior.
Jailbreak Attempts: Identifies prompts trying to bypass guardrails.
Toxicity & Profanity: Flags harmful, offensive, or explicit content.
Restricted Topics: Detects categories like cheating, conspiracy, terrorism, etc.
Text Quality: Measures readability and clarity (informational only).
Customized Theme: Block prompts using custom keywords (e.g., "atomic weapon").
PII Detection: Identifies AADHAR, PAN, SSN, Passport, Email, Phone, IP, Credit Card, Medical License, etc.

Full Customization

Enable/disable checks
Set thresholds
Select PII or restricted topics
Define custom block terms

Resources

", unsafe_allow_html=True) # Add your detailed logic below if check_api_name == "promptInjectionCheck": st.write(f"**Confidence Score:** `{details.get('injectionConfidenceScore', 'N/A')}`") st.write(f"**Threshold:** `{details.get('injectionThreshold', 'N/A')}`") elif check_api_name == "jailbreakCheck": st.write(f"**Similarity Score:** `{details.get('jailbreakSimilarityScore', 'N/A')}`") st.write(f"**Threshold:** `{details.get('jailbreakThreshold', 'N/A')}`") elif check_api_name == "toxicityCheck": st.write(f"**Threshold:** `{details.get('toxicitythreshold', 'N/A')}`") if details.get("toxicityScore"): st.write("**Toxicity Scores:**") for score_obj in details["toxicityScore"]: for name, score in score_obj.items(): st.write(f"- **{name.title()}**: `{score}`") elif check_api_name == "profanityCheck": st.write(f"**Profanity Threshold:** `{details.get('profaneWordsthreshold', 'N/A')}`") profane_words = details.get("profaneWordsIdentified", []) st.write(f"**Profane Words Identified:** {', '.join(profane_words) if profane_words else 'None'}") elif check_api_name == "restrictedtopic": st.write(f"**Topic Threshold:** `{details.get('topicThreshold', 'N/A')}`") scores = details.get("topicScores", []) if scores: st.write("**Detected Topics Scores:**") for score_dict in scores: for topic, score in score_dict.items(): st.write(f"- **{topic}:** `{score}`") else: st.write("**Detected Topics:** None") elif check_api_name == "textQuality": st.write(f"**Readability Score:** `{details.get('readabilityScore', 'N/A')}`") st.write(f"**Text Grade:** `{details.get('textGrade', 'N/A')}`") elif check_api_name == "customThemeCheck": st.write(f"**Similarity Score:** `{details.get('customSimilarityScore', 'N/A')}`") st.write(f"**Theme Threshold:** `{details.get('themeThreshold', 'N/A')}`") elif check_api_name == "privacyCheck": entities = details.get("entitiesRecognised", []) blocked = details.get("entitiesConfiguredToBlock", []) st.write(f"**Entities Recognized:** {', '.join(entities) if entities else 'None'}") st.write(f"**Entities Configured to Block:** {', '.join(blocked) if blocked else 'None'}") elif check_api_name == "refusalCheck": st.write(f"**Similarity Score:** `{details.get('refusalSimilarityScore', 'N/A')}`") st.write(f"**Threshold:** `{details.get('RefusalThreshold', 'N/A')}`") st.markdown("

What does this API do?

Moderation Checks

Full Customization

Resources

INPUT

Select Moderation Checks to apply:

OUTPUT