ShubhamAC's picture
Update README.md
b08fca8 verified

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: ML Inference Evaluation Pipeline
emoji: πŸ“ˆ
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 4.21.0
app_file: app.py
pinned: false
license: mit

ML Inference & Evaluation Pipeline (NLP)

This project demonstrates a production-style machine learning pipeline for serving and evaluating an NLP classification model under CPU-only constraints.
It emphasizes reliability, reproducibility, and ML engineering best practices, rather than model hype.


πŸ” Problem Statement

Building ML systems for real-world use requires more than high training accuracy.
This project covers the end-to-end lifecycle of an ML service:

  • Training a strong, interpretable NLP baseline
  • Evaluating model quality on held-out data
  • Serving predictions through a REST API
  • Measuring inference latency
  • Validating system behavior with automated tests

🧠 Model Overview

  • Task: Binary text classification (fact-checking style)
  • Dataset: LIAR dataset (short, real-world political statements)
  • Model: TF-IDF + Logistic Regression
  • Why this model?
    • Low-latency CPU inference
    • Interpretable features
    • Stable and reproducible behavior
    • Suitable baseline for production systems

Performance (Test Set):

  • F1-score: ~0.64–0.65
  • Accuracy: ~60%
  • Balanced precision and recall

Note: The LIAR dataset is intentionally difficult and context-dependent; these results are expected for classical models.


πŸ—οΈ System Architecture

Request β†’ FastAPI β†’ Preprocessing β†’ Model Inference β†’ Response
                                     β†³ Latency Logging

The model is loaded once at startup and reused across requests to minimize overhead.


πŸš€ API Endpoints

Health Check

GET /health

Returns service status.

Prediction

POST /predict

Request

{
  "text": "The government announced a new tax policy today."
}

Response

{
  "prediction": 0,
  "confidence": 0.63,
  "latency_ms": 85.12
}

πŸ“Š Evaluation & Monitoring

  • Offline evaluation using precision, recall, and F1-score
  • Inference latency logged per request
  • Prediction distribution monitored to detect bias or collapse

All metrics are computed on held-out test data.


πŸ§ͺ Testing

Automated tests are implemented using pytest and FastAPI’s TestClient.

Tests verify:

  • API availability
  • Response schema correctness
  • Valid prediction outputs

Run tests locally:

pytest

βš™οΈ Setup & Run Locally

1. Install dependencies

pip install -r requirements.txt

2. Start the API

uvicorn app.main:app --reload

3. Open Swagger UI

http://127.0.0.1:8000/docs

πŸ”’ Reproducibility

  • Training and inference environments are version-pinned
  • scikit-learn version is aligned to avoid serialization issues
  • Model artifacts are stored separately from inference code

πŸ“Œ Key Takeaways

  • Demonstrates ML engineering maturity, not just modeling
  • Focuses on deployment constraints and evaluation discipline
  • Designed to be interview-safe, explainable, and production-oriented

πŸ“„ License

This project is intended for educational and demonstration purposes.

45c06fc (Initial commit: ML inference and evaluation pipeline)