Spaces:

Subi003
/

ToxicTweet-Tagger

Sleeping

App Files Files Community

Subi003 commited on Mar 8

Commit

4c01182

verified ·

1 Parent(s): 26c91c4

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.dockerignore +43 -0
Dockerfile +30 -0
README.md +15 -10
Toxic_TweetTagger.egg-info/PKG-INFO +46 -0
Toxic_TweetTagger.egg-info/SOURCES.txt +27 -0
Toxic_TweetTagger.egg-info/dependency_links.txt +1 -0
Toxic_TweetTagger.egg-info/requires.txt +25 -0
Toxic_TweetTagger.egg-info/top_level.txt +4 -0
__init__.py +0 -0
__pycache__/__init__.cpython-311.pyc +0 -0
app/__pycache__/main.cpython-311.pyc +0 -0
app/api/__pycache__/api_routes.cpython-311.pyc +0 -0
app/api/__pycache__/dependencies.cpython-311.pyc +0 -0
app/api/__pycache__/schemas.cpython-311.pyc +0 -0
app/api/api_routes.py +39 -0
app/api/dependencies.py +17 -0
app/api/schemas.py +44 -0
app/main.py +86 -0
app/middleware/__init__.py +66 -0
app/middleware/__pycache__/__init__.cpython-311.pyc +0 -0
app/model/MLmodel +31 -0
app/model/artifacts/booster.json +0 -0
app/model/artifacts/metrics.json +1 -0
app/model/artifacts/model.joblib +3 -0
app/model/artifacts/vectorizer.joblib +3 -0
app/model/conda.yaml +15 -0
app/model/python_env.yaml +7 -0
app/model/python_model.pkl +3 -0
app/model/registered_model_meta +2 -0
app/model/requirements.txt +8 -0
app/monitoring/__init__.py +0 -0
app/monitoring/__pycache__/__init__.cpython-311.pyc +0 -0
app/monitoring/__pycache__/http_metrics.cpython-311.pyc +0 -0
app/monitoring/__pycache__/service_metrics.cpython-311.pyc +0 -0
app/monitoring/http_metrics.py +20 -0
app/monitoring/service_metrics.py +62 -0
app/requirements.txt +7 -0
app/services/__pycache__/explainer.cpython-311.pyc +0 -0
app/services/__pycache__/feedback.cpython-311.pyc +0 -0
app/services/__pycache__/inference.cpython-311.pyc +0 -0
app/services/explainer.py +55 -0
app/services/feedback.py +38 -0
app/services/inference.py +113 -0
app/workers/__init__.py +144 -0
app/workers/__pycache__/__init__.cpython-311.pyc +0 -0
components/__init__.py +0 -0
components/__pycache__/__init__.cpython-311.pyc +0 -0
components/__pycache__/data_ingestion.cpython-311.pyc +0 -0
components/__pycache__/data_preprocessing.cpython-311.pyc +0 -0
components/__pycache__/data_validation.cpython-311.pyc +0 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,43 @@

+# Ignore Python cache
+__pycache__/
+*.py[cod]
+*.so
+# Ignore Jupyter notebooks (if not used)
+*.ipynb
+.ipynb_checkpoints/
+# Ignore logs and temp files
+*.log
+logs/
+*.tmp
+*.DS_Store
+# Ignore version control and dev files
+.git/
+.github/
+.vscode/
+*.env
+.env*
+.gitignore
+# MLflow & DVC metadata (keep only if you need them at runtime)
+.mlflow/
+.dvc/
+.dvcignore
+# CI/CD config files
+tox.ini
+pytest.ini
+setup.cfg
+setup.py
+requirements-dev.txt
+# Ignore Docker build context bloat
+*.tar
+*.zip
+*.gz
+*.egg-info/
+# Ignore Hugging Face cache
+~/.cache/huggingface/

Dockerfile ADDED Viewed

	@@ -0,0 +1,30 @@

+FROM python:3.11.11-slim-bookworm
+RUN apt-get update && \
+    apt-get install --no-install-recommends -y build-essential && \
+    rm -rf /var/lib/apt/lists/*
+WORKDIR /app
+COPY . /app
+RUN pip install --no-cache-dir --upgrade pip && \
+    pip install --no-cache-dir -r app/requirements.txt -r app/model/requirements.txt
+RUN mkdir -p /tmp/prometheus_metrics && \
+    chmod 777 /tmp/prometheus_metrics
+COPY start.sh /start.sh
+RUN chmod +x /start.sh
+RUN useradd -m appuser
+USER appuser
+EXPOSE 7860
+ENV HOST=0.0.0.0 \
+    PORT=7860 \
+    PYTHONUNBUFFERED=1 \
+    PROMETHEUS_MULTIPROC_DIR=/tmp/prometheus_metrics
+ENTRYPOINT ["/start.sh"]
+CMD ["gunicorn", "-k", "uvicorn.workers.UvicornWorker", "app.main:app", "--workers", "2", "--bind", "0.0.0.0:7860"]

README.md CHANGED Viewed

@@ -1,10 +1,15 @@
----
-title: ToxicTweet Tagger
-emoji: 😻
-colorFrom: green
-colorTo: green
-sdk: docker
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: Toxic Tweet Tagger
+emoji: 🤖
+colorFrom: indigo
+colorTo: purple
+sdk: docker
+app_port: 7860
+python_version: "3.11"
+app_file: app.py
+pinned: false
+---
+# Toxic Tweet Tagger
+A machine learning app that detects toxic tweets and explains predictions using LIME.

Toxic_TweetTagger.egg-info/PKG-INFO ADDED Viewed

	@@ -0,0 +1,46 @@

+Metadata-Version: 2.4
+Name: Toxic-TweetTagger
+Version: 0.1.0
+Summary: End to end Hate-Tweet detection automation with MLOps implementation
+Home-page: https://github.com/SubinoyBera/Toxic-TweetTagger
+Author: Subinoy Bera
+Author-email: subinoyberadgp@gmail.com
+License: Apache-2.0
+Classifier: Topic :: Engineering/Automation :: ML/MLOps
+Classifier: Development Status :: 4 - Beta
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Operating System :: OS Independent
+Requires-Python: >=3.11
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: numpy==2.2.6
+Requires-Dist: pandas==2.3.1
+Requires-Dist: scipy==1.13.1
+Requires-Dist: scikit-learn==1.7.0
+Requires-Dist: xgboost==3.0.2
+Requires-Dist: nltk==3.9.1
+Requires-Dist: python-box==7.3.2
+Requires-Dist: ensure==1.0.4
+Requires-Dist: PyYAML==6.0.2
+Requires-Dist: dvc==3.61.0
+Requires-Dist: mlflow==2.22.1
+Requires-Dist: dagshub==0.5.10
+Requires-Dist: fastapi==0.116.1
+Requires-Dist: pydantic==2.11.7
+Requires-Dist: tqdm==4.67.1
+Requires-Dist: requests==2.32.4
+Requires-Dist: pytest==8.4.1
+Requires-Dist: tox==4.11.3
+Provides-Extra: testing
+Requires-Dist: pytest>=8.0.0; extra == "testing"
+Requires-Dist: black>=25.0.0; extra == "testing"
+Requires-Dist: flake8>=6.0.0; extra == "testing"
+Requires-Dist: mypy>=1.5.0; extra == "testing"
+Requires-Dist: tox>=4.0.0; extra == "testing"
+Dynamic: license-file
+# Toxic-TweetTagger
+Hate speech detection

Toxic_TweetTagger.egg-info/SOURCES.txt ADDED Viewed

	@@ -0,0 +1,27 @@

+LICENSE
+README.md
+setup.cfg
+setup.py
+src/Toxic_TweetTagger.egg-info/PKG-INFO
+src/Toxic_TweetTagger.egg-info/SOURCES.txt
+src/Toxic_TweetTagger.egg-info/dependency_links.txt
+src/Toxic_TweetTagger.egg-info/requires.txt
+src/Toxic_TweetTagger.egg-info/top_level.txt
+src/components/__init__.py
+src/components/data_ingestion.py
+src/components/data_preprocessing.py
+src/components/feature_engineering.py
+src/components/model_evaluation.py
+src/components/model_training.py
+src/components/register_model.py
+src/constant/__init__.py
+src/constant/constants.py
+src/core/__init__.py
+src/core/config_entity.py
+src/core/configuration.py
+src/core/exception.py
+src/core/logger.py
+src/pipeline/__init__.py
+src/pipeline/ml_pipeline.py
+tests/test_app.py
+tests/test_model.py

Toxic_TweetTagger.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+

Toxic_TweetTagger.egg-info/requires.txt ADDED Viewed

	@@ -0,0 +1,25 @@

+numpy==2.2.6
+pandas==2.3.1
+scipy==1.13.1
+scikit-learn==1.7.0
+xgboost==3.0.2
+nltk==3.9.1
+python-box==7.3.2
+ensure==1.0.4
+PyYAML==6.0.2
+dvc==3.61.0
+mlflow==2.22.1
+dagshub==0.5.10
+fastapi==0.116.1
+pydantic==2.11.7
+tqdm==4.67.1
+requests==2.32.4
+pytest==8.4.1
+tox==4.11.3
+[testing]
+pytest>=8.0.0
+black>=25.0.0
+flake8>=6.0.0
+mypy>=1.5.0
+tox>=4.0.0

Toxic_TweetTagger.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+components
+constant
+core
+pipeline

__init__.py ADDED Viewed

File without changes

__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (152 Bytes). View file

app/__pycache__/main.cpython-311.pyc ADDED Viewed

Binary file (5.04 kB). View file

app/api/__pycache__/api_routes.cpython-311.pyc ADDED Viewed

Binary file (3.36 kB). View file

app/api/__pycache__/dependencies.cpython-311.pyc ADDED Viewed

Binary file (1.18 kB). View file

app/api/__pycache__/schemas.cpython-311.pyc ADDED Viewed

Binary file (3.97 kB). View file

app/api/api_routes.py ADDED Viewed

	@@ -0,0 +1,39 @@

+from fastapi import APIRouter, Request, Depends
+from prometheus_client import generate_latest, CollectorRegistry, multiprocess, CONTENT_TYPE_LATEST
+from fastapi.responses import Response, HTMLResponse
+from src.app.services.inference import InferenceService
+from src.app.services.explainer import ExplainerService
+from src.app.services.feedback import FeedbackService
+from src.app.api.dependencies import get_inference_service, get_explainer_service, get_feedback_service
+from src.app.api.schemas import InferenceRequest, InferenceResponse, FeedbackRequest, ExplanationRequest
+router = APIRouter()
+@router.get("/health")
+async def health_check():
+    return {"status": "ok"}
+@router.post("/predict", response_model=InferenceResponse)
+async def predict(request: Request, payload: InferenceRequest, service: InferenceService = Depends(get_inference_service)):
+    request_id = request.state.request_id
+    return service.predict(request_id, payload.input_tweet, payload.text)
+@router.post("/explain")
+async def explain(payload: ExplanationRequest, service: ExplainerService = Depends(get_explainer_service)):
+    return HTMLResponse(service.explain(payload.input_tweet))
+@router.post("/submit_feedback")
+async def submit_feedback(request: Request, payload: FeedbackRequest, service: FeedbackService = Depends(get_feedback_service)):
+    request_id = request.state.request_id
+    return service.submit_feedback(request_id, payload.predicted_label, payload.feedback_label)
+@router.get("/metrics")
+def metrics():
+    registry = CollectorRegistry()
+    multiprocess.MultiProcessCollector(registry)
+    return Response(
+        generate_latest(registry),
+        media_type=CONTENT_TYPE_LATEST,
+        headers={"Cache-Control": "no-cache"}
+    )

app/api/dependencies.py ADDED Viewed

	@@ -0,0 +1,17 @@

+from fastapi import Request
+from src.app.services.inference import InferenceService
+from src.app.services.explainer import ExplainerService
+from src.app.services.feedback import FeedbackService
+def get_inference_service(request: Request) -> InferenceService:
+    return request.app.state.prediction_service
+def get_explainer_service(request: Request) -> ExplainerService:
+    return request.app.state.explainer_service
+def get_feedback_service(request: Request) -> FeedbackService:
+    return request.app.state.feedback_service

app/api/schemas.py ADDED Viewed

	@@ -0,0 +1,44 @@

+# Schema validation for the API requests and responses
+from pydantic import BaseModel, Field
+from typing import Annotated, Dict, Literal, Optional
+from datetime import datetime
+class InferenceRequest(BaseModel):
+    input_tweet: Annotated[str, Field(..., description="Input tweet or comment text for classification")]
+    text: Annotated[str, Field(..., description="Preprocessed text to be fed to the model for prediction")]
+class ExplanationRequest(BaseModel):
+    input_tweet: Annotated[str, Field(..., description="Input tweet or comment text for generating explanation")]
+class PredictionResult(BaseModel):
+    label: int
+    confidence: float = Field(..., ge=0.0, le=1.0, description="Prediction probability")
+    toxicity: Literal["strong", "high", "uncertain", "none"]
+class ModelInfoSchema(BaseModel):
+    name: str = Field(..., description="Model name")
+    version: int = Field(..., description="Model version")
+    vectorizer: str = Field(..., description="Vectorizer class name")
+class MetadataSchema(BaseModel):
+    latency: float = Field(..., ge=0, description="Response time in seconds")
+    usage: Dict[str, float]
+    model: ModelInfoSchema
+    streamable: bool = Field(default=False)
+    environment: Literal["Standard", "Beta", "Production"]
+    api_version: str
+class InferenceResponse(BaseModel):
+    id: str
+    timestamp: datetime
+    object: Literal["text-classification"]
+    prediction: PredictionResult
+    warnings: Optional[dict] = None
+    metadata: MetadataSchema
+class FeedbackRequest(BaseModel):
+    predicted_label: Literal[0, 1]
+    feedback_label: Literal[0, 1]

app/main.py ADDED Viewed

	@@ -0,0 +1,86 @@

+import sys
+import yaml, json
+from pathlib import Path
+from fastapi import FastAPI
+from contextlib import asynccontextmanager
+import xgboost as xgb
+from lime.lime_text import LimeTextExplainer
+from src.core.constants import REGISTERED_MODELS_DIR
+from src.app.workers import BufferedEventConsumerWorker
+from src.app.services.inference import InferenceService
+from src.app.services.feedback import FeedbackService
+from src.app.services.explainer import ExplainerService
+from src.utils import load_obj
+from src.app.middleware import http_observability_middleware
+from src.core.constants import DATABASE_NAME, PRODUCTION_COLLECTION_NAME, FEEDBACK_COLLECTION_NAME
+from src.core.mongo_client import MongoDBClient
+from src.core.logger import logging
+from src.core.exception import AppException
+from src.app.api.api_routes import router
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    # application state setup
+    try:
+        mongo_client = MongoDBClient()
+        # load model
+        xgb_booster = xgb.Booster()
+        xgb_booster.load_model(Path(REGISTERED_MODELS_DIR, "artifacts", "booster.json"))
+        xgb_booster.set_param({"nthread": 1})
+        # load vectorizer
+        vectorizer = load_obj(Path(REGISTERED_MODELS_DIR, "artifacts"), "vectorizer.joblib")
+        with open(Path(REGISTERED_MODELS_DIR, "artifacts/metrics.json"), 'r') as f:
+            metrics = json.load(f)
+            eval_threshold = metrics.get("threshold", 0.5)
+        # get model version
+        with open(Path("src/app/model/registered_model_meta"), 'r') as f:
+            model_metadata = yaml.safe_load(f)
+        if not model_metadata:
+            raise FileNotFoundError("Failed to load file having model metadata")
+        model_version = int(model_metadata.get("model_version", 0))
+        # initialize workers
+        prediction_event_consumer = BufferedEventConsumerWorker(mongo_client, DATABASE_NAME, PRODUCTION_COLLECTION_NAME)
+        feedback_event_consumer = BufferedEventConsumerWorker(mongo_client, DATABASE_NAME, FEEDBACK_COLLECTION_NAME)
+        # initialize services
+        app.state.prediction_service = InferenceService(xgb_booster, vectorizer, eval_threshold,
+                                                        prediction_event_consumer, model_version)
+        app.state.feedback_service = FeedbackService(feedback_event_consumer)
+        lime_explainer = LimeTextExplainer(class_names=["hate", "non-hate"], bow=False)
+        app.state.explainer_service = ExplainerService(lime_explainer, xgb_booster, vectorizer)
+        logging.info("Infernce API app server started successfully")
+    except Exception as e:
+        logging.critical(f"Startup Failed: {e}", exc_info=True)
+        raise AppException(e, sys)
+    # run application
+    yield
+    # application shutdown
+    prediction_event_consumer.shutdown()
+    feedback_event_consumer.shutdown()
+    mongo_client.close_connection()
+# Create FastAPI app
+app = FastAPI(
+    title="Hate Speech Detection API",
+    version="2.0.0",
+    description="Production-grade ML inference API with monitoring and feedback system.",
+    lifespan=lifespan
+)
+# Register API routes
+app.include_router(router, prefix="/api")
+# Register middleware
+app.middleware("http")(http_observability_middleware)

app/middleware/__init__.py ADDED Viewed

	@@ -0,0 +1,66 @@

+import time
+import uuid
+from fastapi import Request
+from src.core.logger import logging
+from src.app.monitoring.http_metrics import HTTP_REQUESTS_TOTAL, HTTP_REQUEST_DURATION_SECONDS, HTTP_REQUESTS_IN_PROGRESS
+async def http_observability_middleware(request: Request, call_next):
+    # Skip Prometheus metrics endpoint
+    if request.url.path == "/api/metrics":
+        return await call_next(request)
+    request_id = request.headers.get("X-Request-ID")
+    if not request_id:
+        request_id = str(uuid.uuid4())
+    request.state.request_id = request_id
+    method = request.method
+    route = request.scope.get("route")
+    path = route.path if route else request.url.path
+    start_time = time.perf_counter()
+    logging.info(f"[{request_id}] Incoming request {method} {path}")
+    HTTP_REQUESTS_IN_PROGRESS.inc()
+    try:
+        response = await call_next(request)
+        status_code = response.status_code
+    except Exception:
+        duration = time.perf_counter() - start_time
+        HTTP_REQUESTS_TOTAL.labels(
+            method=method,
+            path=path,
+            status="500",
+        ).inc()
+        HTTP_REQUEST_DURATION_SECONDS.labels(
+            method=method,
+            path=path,
+        ).observe(duration)
+        HTTP_REQUESTS_IN_PROGRESS.dec()
+        raise
+    duration = time.perf_counter() - start_time
+    HTTP_REQUESTS_TOTAL.labels(
+        method=method,
+        path=path,
+        status=str(status_code),
+    ).inc()
+    HTTP_REQUEST_DURATION_SECONDS.labels(
+        method=method,
+        path=path,
+    ).observe(duration)
+    HTTP_REQUESTS_IN_PROGRESS.dec()
+    response.headers["X-Request-ID"] = request_id
+    return response

app/middleware/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (2.7 kB). View file

app/model/MLmodel ADDED Viewed

	@@ -0,0 +1,31 @@

+artifact_path: XGB-v4
+flavors:
+  python_function:
+    artifacts:
+      booster:
+        path: artifacts\booster.json
+        uri: D:\My Projects\Toxic Tweet-Tagger\test_models\booster.json
+      metrics:
+        path: artifacts\metrics.json
+        uri: D:\My Projects\Toxic Tweet-Tagger\test_models\metrics.json
+      model:
+        path: artifacts\model.joblib
+        uri: D:\My Projects\Toxic Tweet-Tagger\test_models\model.joblib
+      vectorizer:
+        path: artifacts\vectorizer.joblib
+        uri: D:\My Projects\Toxic Tweet-Tagger\test_models\vectorizer.joblib
+    cloudpickle_version: 3.1.1
+    code: null
+    env:
+      conda: conda.yaml
+      virtualenv: python_env.yaml
+    loader_module: mlflow.pyfunc.model
+    python_model: python_model.pkl
+    python_version: 3.11.5
+    streamable: false
+mlflow_version: 2.22.1
+model_size_bytes: 4509986
+model_uuid: a2c7dc9359894fab899c13a88f9f9a4c
+prompts: null
+run_id: 9b6349ac22e04cf0a22257952299e056
+utc_time_created: '2026-02-17 03:36:56.916972'

app/model/artifacts/booster.json ADDED Viewed

The diff for this file is too large to render. See raw diff

app/model/artifacts/metrics.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"threshold": 0.48, "accuracy": 0.80601184024014, "precision": 0.7922348788705278, "recall": 0.8272049585392411, "f1 score": 0.8093423478795329, "roc_auc": 0.8852078424195188}

app/model/artifacts/model.joblib ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:944104aefb800d70fccf6e65e545f3b3fbbbd79382e461ba06f98992f2e686fe
+size 1240274

app/model/artifacts/vectorizer.joblib ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1fea91a7b6fb18543f7a7e2717e44135985df11a339efbc30b410dbad8d962fe
+size 116783

app/model/conda.yaml ADDED Viewed

	@@ -0,0 +1,15 @@

+channels:
+- conda-forge
+dependencies:
+- python=3.11.5
+- pip<=25.1
+- pip:
+  - mlflow==2.22.1
+  - cloudpickle==3.1.1
+  - numpy==2.2.6
+  - pandas==2.3.1
+  - psutil==7.0.0
+  - scikit-learn==1.6.1
+  - scipy==1.13.1
+  - xgboost==3.0.2
+name: mlflow-env

app/model/python_env.yaml ADDED Viewed

	@@ -0,0 +1,7 @@

+python: 3.11.5
+build_dependencies:
+- pip==25.1
+- setuptools==80.9.0
+- wheel==0.45.1
+dependencies:
+- -r requirements.txt

app/model/python_model.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a6472cb967ef181289c33621e9dbfd2437a53589d1ca75de4307ba20a0bae7ef
+size 1378921

app/model/registered_model_meta ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ model_name: ToxicTagger-Models
2	+ model_version: '26'

app/model/requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+mlflow==2.22.1
+cloudpickle==3.1.1
+numpy==2.2.6
+pandas==2.3.1
+psutil==7.0.0
+scikit-learn==1.6.1
+scipy==1.13.1
+xgboost==3.0.2

app/monitoring/__init__.py ADDED Viewed

File without changes

app/monitoring/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (173 Bytes). View file

app/monitoring/__pycache__/http_metrics.cpython-311.pyc ADDED Viewed

Binary file (931 Bytes). View file

app/monitoring/__pycache__/service_metrics.cpython-311.pyc ADDED Viewed

Binary file (1.96 kB). View file

app/monitoring/http_metrics.py ADDED Viewed

	@@ -0,0 +1,20 @@

+from prometheus_client import Counter, Histogram, Gauge
+HTTP_REQUESTS_TOTAL = Counter(
+    "http_requests_total",
+    "Total number of HTTP requests handled by the inference service",
+    ["method", "path", "status"],
+)
+HTTP_REQUEST_DURATION_SECONDS = Histogram(
+    "http_request_duration_seconds",
+    "HTTP request latency in seconds",
+    ["method", "path"],
+    buckets=(0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0),
+)
+HTTP_REQUESTS_IN_PROGRESS = Gauge(
+    "http_requests_in_progress",
+    "Number of HTTP requests currently being processed",
+)

app/monitoring/service_metrics.py ADDED Viewed

	@@ -0,0 +1,62 @@

+from prometheus_client import Counter, Histogram
+# Requests successfully served
+PREDICTION_REQUEST_SUCCESS = Counter(
+    "predict_requests_success_total",
+    "Total successful prediction responses"
+)
+# Requests failed
+PREDICTION_REQUEST_FAILED = Counter(
+    "predict_requests_failed_total",
+    "Total failed prediction requests"
+)
+# Prediction class distribution (hate / non-hate)
+PREDICTION_CLASS = Counter(
+    "prediction_class_total",
+    "Count of predicted classes",
+    ["class_label"]         # label dimension
+)
+# Prediction label confidence
+PREDICTION_CONFIDENCE = Histogram(
+    "prediction_confidence",
+    "Confidence distribution by predicted class",
+    ["class_label"],
+    buckets=[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
+)
+# Inference responce time
+INFERENCE_LATENCY = Histogram(
+    "model_inference_seconds",
+    "Model inference time in seconds",
+    buckets=[0.005, 0.01, 0.02, 0.03, 0.05, 0.1]
+)
+EXPLAINER_REQUEST_SUCCESS = Counter(
+    "explainer_requests_success_total",
+    "Total successful explain requests"
+)
+EXPLAINER_REQUEST_FAILED = Counter(
+    "explainer_requests_failed_total",
+    "Total failed explain requests"
+)
+# Feedback counter
+FEEDBACK_REQUEST_SUCCESS = Counter(
+    "feedback_subissions_success_total",
+    "Total successful feedback submissions"
+)
+USER_PREDICTION_FEEDBACK = Counter(
+    "user_prediction_feedback_total",
+    "User feedback indicating whether the model prediction was correct",
+    ["feedback"]
+)
+FEEDBACK_REQUEST_FAILED = Counter(
+    "feedback_submissions_failed_total",
+    "Total failed feedback submissions"
+)

app/requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+fastapi==0.116.1
+uvicorn==0.35.0
+joblib==1.5.1
+PyYAML==6.0.2
+lime==0.2.0.1
+gunicorn==23.0.0
+prometheus-client==0.23.1

app/services/__pycache__/explainer.cpython-311.pyc ADDED Viewed

Binary file (3.12 kB). View file

app/services/__pycache__/feedback.cpython-311.pyc ADDED Viewed

Binary file (2.4 kB). View file

app/services/__pycache__/inference.cpython-311.pyc ADDED Viewed

Binary file (5.56 kB). View file

app/services/explainer.py ADDED Viewed

	@@ -0,0 +1,55 @@

+import sys
+import numpy as np
+from src.core.logger import logging
+from src.core.exception import AppException
+from src.app.monitoring.service_metrics import EXPLAINER_REQUEST_SUCCESS, EXPLAINER_REQUEST_FAILED
+class ExplainerService:
+    def __init__(self, explainer, model_booster, vectorizer):
+        """
+        Initializes an instance of LimeExplainer explaination service.
+        Args:
+            explainer (LimeTextExplainer) : An instance of LimeTextExplainer.
+            model_booster (xgb.XGBClassifier) : An instance of the XGBoost model.
+            vectorizer : An instance of the vectorizer to transform input text into numerical features.
+        """
+        self.explainer = explainer
+        self.booster = model_booster
+        self.vectorizer = vectorizer
+    def _get_prediction(self, text) -> np.ndarray:
+        """
+        Internal function to get class probability scores for lime explainer.
+        """
+        X = self.vectorizer.transform(text)
+        prob = self.booster.inplace_predict(X)
+        if len(prob.shape) == 1:
+            prob = np.vstack([1 - prob, prob]).T
+        return prob
+    def explain(self, text: str):
+        """
+        Generate an explanation of the prediction made by the model for a given text.
+        Returns:
+            HTML content of the explanation which is rendered in the UI
+        """
+        try:
+            explanation = self.explainer.explain_instance(
+                text,
+                self._get_prediction,
+                num_features=10,
+                num_samples=20
+            )
+            html_content = explanation.as_html()
+            EXPLAINER_REQUEST_SUCCESS.inc()
+            return html_content
+        except Exception as e:
+            EXPLAINER_REQUEST_FAILED.inc()
+            logging.error(f"Explainer service failed: {e}", exc_info=True)
+            raise AppException(e, sys)

app/services/feedback.py ADDED Viewed

	@@ -0,0 +1,38 @@

+import sys
+from datetime import datetime, timezone
+from src.core.logger import logging
+from src.core.exception import AppException
+from src.app.monitoring.service_metrics import FEEDBACK_REQUEST_SUCCESS, USER_PREDICTION_FEEDBACK, FEEDBACK_REQUEST_FAILED
+class FeedbackService:
+    def __init__(self, feedback_event_consumer):
+        self.event_consumer_worker = feedback_event_consumer
+    def submit_feedback(self, request_id, pred_label, feedback_label):
+        try:
+            feedback_record = {
+                "request_id": request_id,
+                "time_stamp": datetime.now(timezone.utc).isoformat(),
+                "predicted_label": pred_label,
+                "feedback_label": feedback_label
+            }
+            self.event_consumer_worker.add_event(feedback_record)
+            FEEDBACK_REQUEST_SUCCESS.inc()
+            if feedback_label == 1:
+                USER_PREDICTION_FEEDBACK.labels(feedback="correct").inc()
+            else:
+                USER_PREDICTION_FEEDBACK.labels(feedback="incorrect").inc()
+            return {
+                "status": "success",
+                "message": "Feedback recorded successfully",
+            }
+        except Exception as e:
+            logging.exception(f"Failed to submit feedback for request_id: {request_id} : {e}")
+            FEEDBACK_REQUEST_FAILED.inc()
+            raise AppException(e, sys)

app/services/inference.py ADDED Viewed

	@@ -0,0 +1,113 @@

+import time, sys
+from datetime import datetime, timezone
+from src.core.logger import logging
+from src.core.exception import AppException
+from src.app.monitoring.service_metrics import (PREDICTION_REQUEST_SUCCESS, PREDICTION_REQUEST_FAILED,
+                                     PREDICTION_CLASS, INFERENCE_LATENCY, PREDICTION_CONFIDENCE)
+class InferenceService:
+    def __init__(self, model_booster, vectorizer, eval_threshold,
+                 prediction_event_consumer, model_version):
+        self.booster = model_booster
+        self.vectorizer = vectorizer
+        self.threshold = eval_threshold
+        self.event_consumer_worker = prediction_event_consumer
+        self.model_version = model_version
+    def predict(self, request_id: str, input_tweet: str, text: str):
+        """
+        Model inference API endpoint.
+        This endpoint accepts a POST request with a JSON payload containing a comment string.
+        It returns a JSON response containing the model prediction, and metadata.
+        """
+        try:
+            timestamp = datetime.now(timezone.utc).isoformat()
+            start_time = time.perf_counter()
+            x = self.vectorizer.transform([text])
+            prob = self.booster.inplace_predict(x)              # P(class=1)
+            pred = (prob > self.threshold).astype(int)
+        except Exception as e:
+            logging.exception(e)
+            PREDICTION_REQUEST_FAILED.inc()
+            raise AppException(e, sys)
+        if prob is None or len(pred) == 0:
+            logging.error("No prediction made by the model")
+            PREDICTION_REQUEST_FAILED.inc()
+            raise RuntimeError("No prediction made by the model")
+        if prob[0] > 0.70:
+            toxicity = "strong"
+        elif prob[0] > self.threshold + 0.05:
+            toxicity = "high"
+        elif prob[0] > self.threshold - 0.03:
+            toxicity = "uncertain"
+        else:
+            toxicity = "none"
+        confidence = float(prob[0] if pred[0] == 1 else 1-prob[0])
+        confidence_margin = abs(2*float(prob[0]) - 1)
+        warnings = None
+        if confidence_margin < 0.10:
+            message=f"Prediction is close to model decision boundary. Confidence Margin: {round(confidence_margin, 4)}. Manual review is recommended!"
+            warnings = {
+                "code": "LOW_CONFIDENCE_MARGIN",
+                "message": message
+            }
+        # Prepare the record to insert into database
+        prediction_record = {
+            "request_id": request_id,
+            "timestamp": timestamp,
+            "comment": input_tweet,
+            "prediction": int(pred[0]),
+            "confidence": round(confidence, 4)
+        }
+        # Add the record to the batch writer for asynchronous insertion into MongoDB
+        self.event_consumer_worker.add_event(prediction_record)
+        end_time = time.perf_counter()
+        response_time = round((end_time - start_time), 4)
+        PREDICTION_REQUEST_SUCCESS.inc()
+        # Track latency
+        INFERENCE_LATENCY.observe(response_time)
+        # Track class distribution
+        PREDICTION_CLASS.labels(class_label=str(pred[0])).inc()
+        # Track class confidence
+        PREDICTION_CONFIDENCE.labels(class_label=str(pred[0])).observe(confidence)
+        response = {
+            "id": request_id,
+            "timestamp": timestamp,
+            "object": "text-classification",
+            "prediction": {
+                "label": int(pred[0]),
+                "confidence": round(confidence, 4),
+                "toxicity": toxicity,
+            },
+            "warnings": warnings if warnings else None,
+            "metadata": {
+                "latency": response_time,
+                "usage": {
+                    "word_count": len(text.split()),
+                    "total_characters": len(text)
+                },
+                "model": {
+                    "name": "XGB-Classifier-Booster",
+                    "version": self.model_version,
+                    "vectorizer": str(type(self.vectorizer).__name__),
+                },
+                "streamable": False,
+                "environment": "Production",
+                "api_version": "v-2.0"
+            }
+        }
+        return response

app/workers/__init__.py ADDED Viewed

	@@ -0,0 +1,144 @@

+# This module contains the implementation of the BufferedEventConsumerWorker class, which is responsible for consuming events from a buffer queue and writing them to a MongoDB collection in batches.
+import time
+import threading
+from typing import Any
+from queue import Queue, Empty, Full
+from src.core.logger import logging
+class BufferedEventConsumerWorker:
+    def __init__(self, mongo_client, database_name, collection_name, queue_maxsize: int = 1200,
+                 max_batch_size: int = 1000, flush_interval: int = 30, mongo_timeout: int = 5):
+        """
+        Initializes the buffered event consumer worker.
+        Args:
+            mongo_client (MongoClient): The MongoDB client.
+            database_name (str): The name of the MongoDB database.
+            collection_name (str): The name of the MongoDB collection.
+            queue_maxsize (int, optional): The maximum size of the queue. Defaults to 1200.
+            max_batch_size (int, optional): The maximum batch size for write operations. Defaults to 1000.
+            flush_interval (int, optional): The interval (in seconds) at which to flush the queue. Defaults to 30.
+            mongo_timeout (int, optional): The timeout (in seconds) for write operations. Defaults to 5.
+        """
+        self.queue: Queue[dict[str, Any] | None] = Queue(queue_maxsize)
+        self.max_batch_size = max_batch_size
+        self.flush_interval = flush_interval
+        self.mongo_timeout = mongo_timeout
+        self.shutdown_event: threading.Event = threading.Event()
+        self.client = mongo_client
+        self.database_name = database_name
+        self.collection_name = collection_name
+        self.worker: threading.Thread = threading.Thread(
+            target=self._writer_worker,
+            daemon=True
+        )
+        self.worker.start()
+    def add_event(self, record: dict) -> None:
+        """
+        Adds a record to the buffer queue.
+        Args:
+            record (dict): The record to add to the buffer queue.
+        """
+        try:
+            self.queue.put(record, timeout=0.3)
+        except Full:
+            logging.warning(f"Failed to add record in buffer queue: Queue is full")
+    def shutdown(self) -> None:
+        """
+        Gracefully shut down the worker thread.
+        Ensures final flush before exit.
+        """
+        self.shutdown_event.set()
+        # Wake up worker if it's blocked on queue.get()
+        self.queue.put(None)
+        # Wait the main thread until worker fully exits.
+        self.worker.join()
+    # INTERNAL WORKER
+    def _writer_worker(self) -> None:
+        """
+        The worker thread responsible for consuming records from the buffer queue and writing them to MongoDB.
+        It runs indefinitely until the shutdown event is set, at which point it will drain the queue quickly and exit.
+        The worker thread tries to flush the queue at regular intervals, or when the batch size reaches the maximum threshold.
+        If the queue is empty, it will wait indefinitely for new records to arrive. If a timeout is reached, it will flush the queue and reset the timer.
+        """
+        batch = []
+        first_record_time = None
+        while not self.shutdown_event.is_set():
+            try:
+                if first_record_time is None:
+                    # No records yet → wait indefinitely
+                    record = self.queue.get()
+                else:
+                    elapsed = time.time() - first_record_time
+                    remaining = max(self.flush_interval - elapsed, 0)
+                    record = self.queue.get(timeout=remaining)
+                if record is None:
+                    break
+                batch.append(record)
+                if first_record_time is None:
+                    first_record_time = time.time()
+                # Drain quickly if batch growing
+                while len(batch) < self.max_batch_size:
+                    try:
+                        record = self.queue.get_nowait()
+                        if record is None:
+                            break
+                        batch.append(record)
+                    except Empty:
+                        break
+            except Empty:
+                # Timeout reached
+                pass
+            # Flush conditions
+            if batch and (
+                len(batch) >= self.max_batch_size or
+                (first_record_time and
+                 time.time() - first_record_time >= self.flush_interval)
+            ):
+                self._flush(batch)
+                batch.clear()
+                first_record_time = None
+        # Final flush on shutdown
+        if batch:
+            self._flush(batch)
+        logging.info("BufferedEventConsumer worker stopped cleanly.")
+    # Database flush
+    def _flush(self, batch_records: list):
+        """
+        Writes batch records to MongoDB with basic failure handling.
+        Args:
+            batch_records (list): The list of records to flush to the database.
+        Raises:
+            Exception: If an error occurs while flushing the batch records to the database.
+        """
+        try:
+            self.client.insert_docs(self.collection_name,
+                                    self.database_name,
+                                    batch_records
+                                )
+            logging.info(f"Flushed {len(batch_records)} records to MongoDB")
+        except Exception as e:
+            logging.error(f"BufferedBatchWriter failed to flush: {e}", exc_info=True)

app/workers/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (7.08 kB). View file

components/__init__.py ADDED Viewed

File without changes

components/__pycache__/__init__.cpython-311.pyc ADDED Viewed

Binary file (163 Bytes). View file

components/__pycache__/data_ingestion.cpython-311.pyc ADDED Viewed

Binary file (5 kB). View file

components/__pycache__/data_preprocessing.cpython-311.pyc ADDED Viewed

Binary file (9.21 kB). View file

components/__pycache__/data_validation.cpython-311.pyc ADDED Viewed

Binary file (13.9 kB). View file