Spaces:
Build error
A newer version of the Gradio SDK is available: 6.13.0
title: ML Inference Evaluation Pipeline
emoji: π
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 4.21.0
app_file: app.py
pinned: false
license: mit
ML Inference & Evaluation Pipeline (NLP)
This project demonstrates a production-style machine learning pipeline for serving and evaluating an NLP classification model under CPU-only constraints.
It emphasizes reliability, reproducibility, and ML engineering best practices, rather than model hype.
π Problem Statement
Building ML systems for real-world use requires more than high training accuracy.
This project covers the end-to-end lifecycle of an ML service:
- Training a strong, interpretable NLP baseline
- Evaluating model quality on held-out data
- Serving predictions through a REST API
- Measuring inference latency
- Validating system behavior with automated tests
π§ Model Overview
- Task: Binary text classification (fact-checking style)
- Dataset: LIAR dataset (short, real-world political statements)
- Model: TF-IDF + Logistic Regression
- Why this model?
- Low-latency CPU inference
- Interpretable features
- Stable and reproducible behavior
- Suitable baseline for production systems
Performance (Test Set):
- F1-score: ~0.64β0.65
- Accuracy: ~60%
- Balanced precision and recall
Note: The LIAR dataset is intentionally difficult and context-dependent; these results are expected for classical models.
ποΈ System Architecture
Request β FastAPI β Preprocessing β Model Inference β Response
β³ Latency Logging
The model is loaded once at startup and reused across requests to minimize overhead.
π API Endpoints
Health Check
GET /health
Returns service status.
Prediction
POST /predict
Request
{
"text": "The government announced a new tax policy today."
}
Response
{
"prediction": 0,
"confidence": 0.63,
"latency_ms": 85.12
}
π Evaluation & Monitoring
- Offline evaluation using precision, recall, and F1-score
- Inference latency logged per request
- Prediction distribution monitored to detect bias or collapse
All metrics are computed on held-out test data.
π§ͺ Testing
Automated tests are implemented using pytest and FastAPIβs TestClient.
Tests verify:
- API availability
- Response schema correctness
- Valid prediction outputs
Run tests locally:
pytest
βοΈ Setup & Run Locally
1. Install dependencies
pip install -r requirements.txt
2. Start the API
uvicorn app.main:app --reload
3. Open Swagger UI
http://127.0.0.1:8000/docs
π Reproducibility
- Training and inference environments are version-pinned
- scikit-learn version is aligned to avoid serialization issues
- Model artifacts are stored separately from inference code
π Key Takeaways
- Demonstrates ML engineering maturity, not just modeling
- Focuses on deployment constraints and evaluation discipline
- Designed to be interview-safe, explainable, and production-oriented
π License
This project is intended for educational and demonstration purposes.
45c06fc (Initial commit: ML inference and evaluation pipeline)