Spaces:

ShubhamAC
/

ml-inference-evaluation-pipeline

Build error

App Files Files Community

ml-inference-evaluation-pipeline / README.md

ShubhamAC

Update README.md

b08fca8 verified 4 months ago

preview code

raw

history blame contribute delete

3.42 kB

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

metadata

title: ML Inference Evaluation Pipeline
emoji: 📈
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 4.21.0
app_file: app.py
pinned: false
license: mit

ML Inference & Evaluation Pipeline (NLP)

This project demonstrates a production-style machine learning pipeline for serving and evaluating an NLP classification model under CPU-only constraints.
It emphasizes reliability, reproducibility, and ML engineering best practices, rather than model hype.

🔍 Problem Statement

Building ML systems for real-world use requires more than high training accuracy.
This project covers the end-to-end lifecycle of an ML service:

Training a strong, interpretable NLP baseline
Evaluating model quality on held-out data
Serving predictions through a REST API
Measuring inference latency
Validating system behavior with automated tests

🧠 Model Overview

Task: Binary text classification (fact-checking style)
Dataset: LIAR dataset (short, real-world political statements)
Model: TF-IDF + Logistic Regression
Why this model?
- Low-latency CPU inference
- Interpretable features
- Stable and reproducible behavior
- Suitable baseline for production systems

Performance (Test Set):

F1-score: ~0.64–0.65
Accuracy: ~60%
Balanced precision and recall

Note: The LIAR dataset is intentionally difficult and context-dependent; these results are expected for classical models.

🏗️ System Architecture

Request → FastAPI → Preprocessing → Model Inference → Response
↳ Latency Logging

The model is loaded once at startup and reused across requests to minimize overhead.

🚀 API Endpoints

Health Check

GET /health

Returns service status.

Prediction

POST /predict

Request

{
  "text": "The government announced a new tax policy today."
}

Response

{
  "prediction": 0,
  "confidence": 0.63,
  "latency_ms": 85.12
}

📊 Evaluation & Monitoring

Offline evaluation using precision, recall, and F1-score
Inference latency logged per request
Prediction distribution monitored to detect bias or collapse

All metrics are computed on held-out test data.

🧪 Testing

Automated tests are implemented using pytest and FastAPI’s TestClient.

Tests verify:

API availability
Response schema correctness
Valid prediction outputs

Run tests locally:

pytest

⚙️ Setup & Run Locally

1. Install dependencies

pip install -r requirements.txt

2. Start the API

uvicorn app.main:app --reload

3. Open Swagger UI

http://127.0.0.1:8000/docs

🔒 Reproducibility

Training and inference environments are version-pinned
scikit-learn version is aligned to avoid serialization issues
Model artifacts are stored separately from inference code

📌 Key Takeaways

Demonstrates ML engineering maturity, not just modeling
Focuses on deployment constraints and evaluation discipline
Designed to be interview-safe, explainable, and production-oriented

📄 License

This project is intended for educational and demonstration purposes.

45c06fc (Initial commit: ML inference and evaluation pipeline)