Spaces:

UDHOV
/

nepali-hate-detector-backend

Running

App Files Files Community

nepali-hate-detector-backend / backend /API_REFERENCE.md

UDHOV

deploy fastapi backend to hf spaces

7255083 about 1 month ago

preview code

raw

history blame contribute delete

23.5 kB

Nepali Hate Content Detection — API Reference

Base URL: http://localhost:8000
Interactive docs: http://localhost:8000/docs (Swagger UI)
Start server: uvicorn backend.app.main:app --reload --host 0.0.0.0 --port 8000 from major_project/ root

Health Check
Status & Capabilities
Predict — Single Text
Analyze — Preprocessing Info
Explain — LIME
Explain — SHAP
Explain — Captum IG
Batch Predict (Streaming)
History — Fetch
History — Stats
History — Clear
Error Reference
TypeScript Types
Frontend Integration Guide

1. Health Check

GET /health

Returns whether the server is up and the model has finished loading. Call this on app mount to gate the UI.

Response 200

{
  "status": "ok",
  "model_loaded": true,
  "device": "cpu"
}

Field	Type	Notes
`status`	`string`	Always `"ok"` if server is running
`model_loaded`	`boolean`	`false` while model is still downloading/loading at startup
`device`	`string`	`"cpu"` or `"cuda"`

Frontend use: Poll this every 2 seconds on mount until model_loaded === true, then unlock the main UI.

2. Status & Capabilities

GET /api/status

Returns which optional XAI packages are installed. Call once on load to decide which Explain buttons to show or hide.

Response 200

{
  "model_loaded": true,
  "device": "cpu",
  "preprocessor": true,
  "lime": true,
  "shap": true,
  "captum": false
}

Field	Type	Notes
`model_loaded`	`boolean`	Same as `/health`
`device`	`string`	`"cpu"` or `"cuda"`
`preprocessor`	`boolean`	If `false`, raw text is passed to model without script conversion
`lime`	`boolean`	Whether `lime` package is installed
`shap`	`boolean`	Whether `shap` package is installed
`captum`	`boolean`	Whether `captum` package is installed

Frontend use: If captum === false, disable the Captum tab. Same for LIME/SHAP.

3. Predict — Single Text

POST /api/predict
Content-Type: application/json

Core endpoint. Preprocesses input → runs XLM-RoBERTa-large → returns label + probabilities + preprocessing details.

Request body

{
  "text": "महिलाले घरमा बस्नु पर्छ",
  "save_to_history": true
}

Field	Type	Required	Notes
`text`	`string`	✅	1–5000 chars. Devanagari, Romanized Nepali, English, or mixed. Must not be whitespace only
`save_to_history`	`boolean`	❌	Default `true`. Saves result to `data/prediction_history.jsonl` as background task

Response 200

{
  "prediction": "OS",
  "confidence": 0.9909,
  "probabilities": {
    "NO": 0.0034,
    "OO": 0.0041,
    "OR": 0.0016,
    "OS": 0.9909
  },
  "original_text": "महिलाले घरमा बस्नु पर्छ",
  "preprocessed_text": "महिलाले घरमा बस्नु पर्छ",
  "emoji_features": {
    "has_hate_emoji": 0,
    "has_mockery_emoji": 0,
    "has_positive_emoji": 0,
    "has_sadness_emoji": 0,
    "has_fear_emoji": 0,
    "has_disgust_emoji": 0,
    "hate_emoji_count": 0,
    "mockery_emoji_count": 0,
    "positive_emoji_count": 0,
    "sadness_emoji_count": 0,
    "fear_emoji_count": 0,
    "disgust_emoji_count": 0,
    "total_emoji_count": 0,
    "hate_to_positive_ratio": 0.0,
    "has_mixed_sentiment": 0,
    "unknown_emoji_count": 0,
    "has_unknown_emoji": 0,
    "known_emoji_ratio": 1.0
  },
  "script_info": {
    "script_type": "devanagari",
    "confidence": 0.98
  },
  "error": null
}

Prediction labels

Label	Meaning	Display color
`NO`	Non-offensive	Green `#28a745`
`OO`	Other-offensive (general)	Yellow `#ffc107`
`OR`	Offensive-Racist (race/ethnicity/religion hate)	Red `#dc3545`
`OS`	Offensive-Sexist (gender/sexuality hate)	Purple `#6f42c1`

emoji_features fields

18 fields total. All are int except hate_to_positive_ratio and known_emoji_ratio which are float.

Field	Description
`has_hate_emoji`	Binary flag: 1 if text contains anger/weapon emojis
`hate_emoji_count`	Count of hate-related emojis
`has_positive_emoji`	Binary flag
`positive_emoji_count`	Count
`total_emoji_count`	Total emoji count
`hate_to_positive_ratio`	`hate_count / max(positive_count, 1)`
`has_mixed_sentiment`	1 if both hate and positive emojis present
`unknown_emoji_count`	Emojis not in the mapping dictionary
`known_emoji_ratio`	Fraction of emojis that have Nepali translations

script_info fields

Field	Description
`script_type`	One of: `devanagari`, `romanized_nepali`, `english`, `mixed`, `other`
`confidence`	Float 0–1

Error cases

Status	Condition
`422`	Empty or whitespace-only text
`503`	Model not yet loaded
`503`	Out of memory during inference
`500`	Unexpected server error

4. Analyze — Preprocessing Info

POST /api/analyze
Content-Type: application/json

Lightweight endpoint — runs only script detection and emoji analysis, does not run the model. Use for the preprocessing details panel without triggering a full prediction.

Request body

{
  "text": "timi murkha chau 😡"
}

Field	Type	Required	Notes
`text`	`string`	✅	1–5000 chars

Response 200

{
  "script_info": {
    "script_type": "romanized_nepali",
    "confidence": 0.80
  },
  "emoji_info": {
    "emojis_found": ["😡"],
    "total_count": 1,
    "known_emojis": ["😡"],
    "known_count": 1,
    "unknown_emojis": [],
    "unknown_count": 0,
    "coverage": 1.0
  }
}

emoji_info fields

Field	Type	Description
`emojis_found`	`string[]`	All emoji characters found in text
`total_count`	`number`	Total emoji count
`known_emojis`	`string[]`	Emojis that have a Nepali translation mapping
`known_count`	`number`
`unknown_emojis`	`string[]`	Emojis not in the mapping dictionary
`unknown_count`	`number`
`coverage`	`number`	`known_count / total_count`, or `1.0` if no emojis

5. Explain — LIME

POST /api/explain/lime
Content-Type: application/json

Generates word-level importance scores using LIME (Local Interpretable Model-agnostic Explanations). LIME perturbs the preprocessed text, so token labels always align with what the model saw.

Request body

{
  "text": "महिलाले घरमा बस्नु पर्छ",
  "num_samples": 200,
  "n_steps": 50
}

Field	Type	Required	Default	Notes
`text`	`string`	✅	—	1–2000 chars (shorter limit than predict — LIME runs many model calls)
`num_samples`	`integer`	❌	`200`	Range 50–1000. Higher = more reliable scores, higher latency
`n_steps`	`integer`	❌	`50`	Only used by Captum, ignored here

Response 200

{
  "method": "LIME",
  "prediction": "OS",
  "confidence": 0.9909,
  "word_scores": [
    { "word": "घरमा", "score": 0.182 },
    { "word": "महिलाले", "score": 0.143 },
    { "word": "बस्नु", "score": 0.091 },
    { "word": "पर्छ", "score": -0.034 }
  ],
  "preprocessed_text": "महिलाले घरमा बस्नु पर्छ",
  "convergence_delta": null,
  "error": null
}

word_scores interpretation

Score	Meaning
Positive	Word pushes prediction toward the predicted class
Negative	Word pushes prediction away from the predicted class
High absolute value	Strong influence

Words are returned in LIME's natural order (by score magnitude). Sort by abs(score) descending for a ranked importance bar chart.

Frontend rendering: Horizontal bar chart. Positive bars green, negative bars red. Display word on the y-axis.

6. Explain — SHAP

POST /api/explain/shap
Content-Type: application/json

Generates attributions using SHAP. Falls back to leave-one-out occlusion if the primary SHAP text masker fails.

Request body — same shape as LIME. num_samples is ignored; n_steps is ignored.

{
  "text": "महिलाले घरमा बस्नु पर्छ"
}

Response 200

{
  "method": "SHAP",
  "prediction": "OS",
  "confidence": 0.9909,
  "word_scores": [
    { "word": "घरमा", "score": 0.211 },
    { "word": "महिलाले", "score": 0.178 },
    { "word": "बस्नु", "score": 0.095 },
    { "word": "पर्छ", "score": -0.021 }
  ],
  "preprocessed_text": "महिलाले घरमा बस्नु पर्छ",
  "convergence_delta": null,
  "error": null
}

Word scores are sorted by descending absolute value — most influential words first.

If the fallback was used, error will be "Used gradient_fallback" (not a failure — result is still valid).

7. Explain — Captum IG

POST /api/explain/captum
Content-Type: application/json

Generates subword token attributions using Layer Integrated Gradients (Captum). Works at the subword tokenizer level, so words may appear as ▁महिलाले (SentencePiece prefix).

Request body

{
  "text": "महिलाले घरमा बस्नु पर्छ",
  "n_steps": 50
}

Field	Type	Required	Default	Notes
`text`	`string`	✅	—	1–2000 chars
`n_steps`	`integer`	❌	`50`	Range 10–200. Increase to 100+ if `convergence_delta > 0.05`
`num_samples`	`integer`	❌	`200`	Only used by LIME, ignored here

Response 200

{
  "method": "Captum-IG",
  "prediction": "OS",
  "confidence": 0.9909,
  "word_scores": [
    { "word": "महिलाले", "score": 0.842 },
    { "word": "घरमा", "score": 0.631 },
    { "word": "बस्नु", "score": 0.417 },
    { "word": "पर्छ", "score": 0.203 }
  ],
  "preprocessed_text": "महिलाले घरमा बस्नु पर्छ",
  "convergence_delta": 0.0031,
  "error": null
}

Field	Notes
`word_scores[].score`	Signed attribution (sum of subword attributions). Positive = contributes to prediction
`convergence_delta`	Quality indicator. Values below `0.05` = reliable. Increase `n_steps` if high

⚠️ Memory warning: Captum is the most memory-intensive method. It may return 422 on low-RAM cloud deployments. Use LIME or SHAP as fallback — the frontend should check captum in /api/status before showing this option.

8. Batch Predict (Streaming)

POST /api/batch
Content-Type: application/json

Classifies multiple texts and streams results back as NDJSON (Newline-Delimited JSON). Each text is processed independently — an error on one does not abort the batch.

Request body

{
  "texts": [
    "यो राम्रो छ",
    "तिमी मुर्ख हौ",
    "timi murkha chau"
  ]
}

Field	Type	Required	Notes
`texts`	`string[]`	✅	1–200 items. Empty strings are stripped silently

Response — NDJSON stream

Content-Type: application/x-ndjson

Each line is a complete JSON object. Two types of lines:

Progress line (one per text):

{
  "index": 0,
  "total": 3,
  "result": {
    "text": "यो राम्रो छ",
    "full_text": "यो राम्रो छ",
    "prediction": "NO",
    "confidence": 0.9721,
    "preprocessed_text": "यो राम्रो छ"
  }
}

Final sentinel line (last line always):

{ "done": true, "total": 3 }

Error result (when one text fails):

{
  "index": 1,
  "total": 3,
  "result": {
    "text": "...",
    "full_text": "...",
    "prediction": "Error",
    "confidence": 0.0,
    "preprocessed_text": "",
    "error": "error message"
  }
}

Frontend streaming example (fetch API):

const response = await fetch("http://localhost:8000/api/batch", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ texts }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split("\n");
  buffer = lines.pop(); // keep incomplete last line

  for (const line of lines) {
    if (!line.trim()) continue;
    const data = JSON.parse(line);

    if (data.done) {
      // Batch complete
      setProgress(100);
    } else {
      // Update progress bar and results table
      setProgress(Math.round(((data.index + 1) / data.total) * 100));
      appendResult(data.result);
    }
  }
}

9. History — Fetch

GET /api/history?limit=100&offset=0

Returns saved predictions in reverse-chronological order (newest first).

Query parameters

Param	Type	Default	Range	Description
`limit`	`integer`	`100`	1–500	Max records to return
`offset`	`integer`	`0`	≥0	Skip this many records from the newest end

Response 200

{
  "items": [
    {
      "timestamp": "2026-04-10T10:23:41.123456",
      "text": "तिमी मुर्ख हौ",
      "prediction": "OO",
      "confidence": 0.8732,
      "probabilities": {
        "NO": 0.08,
        "OO": 0.87,
        "OR": 0.03,
        "OS": 0.02
      },
      "preprocessed_text": "तिमी मुर्ख हौ",
      "emoji_features": { "total_emoji_count": 0, "..." : "..." }
    }
  ],
  "total": 42,
  "limit": 100,
  "offset": 0
}

Pagination example:

Page 1: GET /api/history?limit=20&offset=0
Page 2: GET /api/history?limit=20&offset=20
Page 3: GET /api/history?limit=20&offset=40

10. History — Stats

GET /api/history/stats

Returns aggregated statistics without fetching every record. Use for the dashboard summary row.

Response 200 (with history)

{
  "total": 42,
  "avg_confidence": 0.8741,
  "class_counts": {
    "NO": 18,
    "OO": 12,
    "OR": 5,
    "OS": 7
  },
  "most_common_class": "NO"
}

Response 200 (empty history)

{
  "total": 0,
  "avg_confidence": null,
  "class_counts": {},
  "most_common_class": null
}

11. History — Clear

DELETE /api/history

Permanently deletes the history file. No confirmation prompt — handle that in the UI.

Response 200

{
  "message": "History cleared. 42 record(s) deleted.",
  "deleted_count": 42
}

Response 404 (if already empty)

{
  "detail": "History is already empty — nothing to clear."
}

12. Error Reference

All error responses follow FastAPI's standard shape:

{
  "detail": "Human-readable error message"
}

Status	Meaning	When it happens
`422`	Validation error	Empty text, batch > 200, invalid field types
`503`	Service unavailable	Model still loading at startup, out of memory
`404`	Not found	History already empty on DELETE
`500`	Internal server error	Unexpected exception in inference or XAI

13. TypeScript Types

Copy these into your React/Vite project:

// ── Labels ──────────────────────────────────────────────────────────────────
export type Label = "NO" | "OO" | "OR" | "OS" | "Error";

export const LABEL_META = {
  NO: { text: "Non-Offensive",     color: "#28a745" },
  OO: { text: "Other-Offensive",   color: "#ffc107" },
  OR: { text: "Offensive-Racist",  color: "#dc3545" },
  OS: { text: "Offensive-Sexist",  color: "#6f42c1" },
  Error: { text: "Error",          color: "#6c757d" },
} as const;

// ── /health ──────────────────────────────────────────────────────────────────
export interface HealthResponse {
  status: string;
  model_loaded: boolean;
  device: string;
}

// ── /api/status ───────────────────────────────────────────────────────────────
export interface StatusResponse {
  model_loaded: boolean;
  device: string;
  preprocessor: boolean;
  lime: boolean;
  shap: boolean;
  captum: boolean;
}

// ── /api/predict ──────────────────────────────────────────────────────────────
export interface PredictRequest {
  text: string;
  save_to_history?: boolean;
}

export interface EmojiFeatures {
  has_hate_emoji: number;
  has_mockery_emoji: number;
  has_positive_emoji: number;
  has_sadness_emoji: number;
  has_fear_emoji: number;
  has_disgust_emoji: number;
  hate_emoji_count: number;
  mockery_emoji_count: number;
  positive_emoji_count: number;
  sadness_emoji_count: number;
  fear_emoji_count: number;
  disgust_emoji_count: number;
  total_emoji_count: number;
  hate_to_positive_ratio: number;
  has_mixed_sentiment: number;
  unknown_emoji_count: number;
  has_unknown_emoji: number;
  known_emoji_ratio: number;
}

export interface ScriptInfo {
  script_type: "devanagari" | "romanized_nepali" | "english" | "mixed" | "other";
  confidence: number;
}

export interface PredictResponse {
  prediction: Label;
  confidence: number;
  probabilities: Record<Label, number>;
  original_text: string;
  preprocessed_text: string;
  emoji_features: EmojiFeatures;
  script_info: ScriptInfo | null;
  error: string | null;
}

// ── /api/analyze ──────────────────────────────────────────────────────────────
export interface AnalyzeRequest {
  text: string;
}

export interface EmojiInfo {
  emojis_found: string[];
  total_count: number;
  known_emojis: string[];
  known_count: number;
  unknown_emojis: string[];
  unknown_count: number;
  coverage: number;
}

export interface AnalyzeResponse {
  script_info: ScriptInfo;
  emoji_info: EmojiInfo;
}

// ── /api/explain/* ────────────────────────────────────────────────────────────
export interface ExplainRequest {
  text: string;
  num_samples?: number; // LIME only, default 200
  n_steps?: number;     // Captum only, default 50
}

export interface WordScore {
  word: string;
  score: number;
}

export interface ExplainResponse {
  method: "LIME" | "SHAP" | "Captum-IG";
  prediction: Label;
  confidence: number;
  word_scores: WordScore[];
  preprocessed_text: string;
  convergence_delta: number | null; // Captum only
  error: string | null;
}

// ── /api/batch ────────────────────────────────────────────────────────────────
export interface BatchRequest {
  texts: string[];
}

export interface BatchResult {
  text: string;          // truncated to 80 chars
  full_text: string;
  prediction: Label;
  confidence: number;
  preprocessed_text: string;
  error?: string;
}

export interface BatchProgressLine {
  index: number;
  total: number;
  result: BatchResult;
}

export interface BatchDoneLine {
  done: true;
  total: number;
}

export type BatchStreamLine = BatchProgressLine | BatchDoneLine;

// ── /api/history ──────────────────────────────────────────────────────────────
export interface HistoryItem {
  timestamp: string; // ISO 8601
  text: string;
  prediction: Label;
  confidence: number;
  probabilities: Record<string, number>;
  preprocessed_text: string;
  emoji_features: EmojiFeatures;
}

export interface HistoryResponse {
  items: HistoryItem[];
  total: number;
  limit: number;
  offset: number;
}

export interface HistoryStatsResponse {
  total: number;
  avg_confidence: number | null;
  class_counts: Record<string, number>;
  most_common_class: string | null;
}

14. Frontend Integration Guide

Recommended call order on app load

1. GET /health            → poll until model_loaded === true
2. GET /api/status        → store capabilities, show/hide XAI buttons
3. Ready to accept input

Single prediction flow

user submits text
  → POST /api/predict
  → show prediction badge (color from LABEL_META)
  → show probability bar chart (4 bars)
  → show preprocessing details (script_info + emoji_features)
  → if emoji_features.total_emoji_count > 0, show emoji breakdown panel

Explainability flow

user selects LIME / SHAP / Captum tab
  → check status.lime / status.shap / status.captum before enabling tab
  → POST /api/explain/lime  (or /shap or /captum)
  → render horizontal bar chart from word_scores
    - sort by abs(score) descending
    - positive score → green bar
    - negative score → red bar
  → for Captum: show convergence_delta warning if > 0.05

Batch flow

user pastes texts or uploads CSV
  → POST /api/batch
  → read response as NDJSON stream (see streaming example in §8)
  → update progress bar: (index + 1) / total * 100
  → append each result to results table as it arrives
  → on { done: true }, finalize and enable download CSV

History flow

on History tab open:
  → GET /api/history/stats   → show summary metrics
  → GET /api/history?limit=20&offset=0  → show table

pagination:
  → GET /api/history?limit=20&offset=N

clear button:
  → confirm in UI first
  → DELETE /api/history

CORS

The backend allows requests from http://localhost:5173 (Vite default) and http://localhost:3000 (CRA default). If you deploy the frontend to a different URL, set the FRONTEND_URL environment variable before starting the server:

FRONTEND_URL=https://yourapp.vercel.app uvicorn backend.app.main:app ...

Environment variables

Variable	Default	Description
`MODEL_PATH`	`models/saved_models/xlm_roberta_results/large_final`	Local model path. Falls back to HuggingFace if not found
`HF_MODEL_ID`	`UDHOV/xlm-roberta-large-nepali-hate-classification`	HuggingFace model ID
`HISTORY_FILE`	`data/prediction_history.jsonl`	History file location
`FRONTEND_URL`	`""`	Additional CORS origin for deployed frontend

Nepali Hate Content Detection — API Reference

Table of Contents

1. Health Check

2. Status & Capabilities

3. Predict — Single Text

4. Analyze — Preprocessing Info

5. Explain — LIME

6. Explain — SHAP

7. Explain — Captum IG

8. Batch Predict (Streaming)

9. History — Fetch

10. History — Stats

11. History — Clear

12. Error Reference

13. TypeScript Types

14. Frontend Integration Guide

Recommended call order on app load

Single prediction flow

Explainability flow

Batch flow

History flow

CORS

Environment variables