Spaces:

LeonardoMdSA
/

Context-aware-NLP-classification-platform-with-MCP

Sleeping

App Files Files Community

LeonardoMdSA commited on Jan 8

Commit

3915e0a

1 Parent(s): b788390

README update

Browse files

Files changed (3) hide show

README.md +251 -91
app/models/trained_pipeline.joblib +0 -0
models/trained_pipeline.joblib +0 -0

README.md CHANGED Viewed

@@ -9,100 +9,260 @@ pinned: false
 license: mit
 ---
-# Under construction...
-venv\Scripts\activate
-uvicorn app.main:app --reload --host 127.0.0.1 --port 8000
-### Tests
 pytest -v
-Or manual smoke test in test_backend.py
-### Train-evaluate model
-python scripts\seed_data.py
-python scripts\train_model.py
-python scripts\evaluate.py
-## Initial struture
-Context-aware NLP classification platform with MCP/
-├─ Dockerfile
-├─ docker-compose.yml
-├─ LICENSE
-├─ README.md
-├─ requirements-dev.txt
-├─ requirements.txt
-├─ start.sh
-├─ test_backend.py
-├─ app/
-│  ├─ config.py
-│  ├─ logging_config.py
-│  ├─ main.py                  # FastAPI entrypoint
-│  ├─ api/
-│  │  ├─ routes.py             # API endpoints (e.g., /predict)
-│  │  └─ schemas.py
-│  ├─ classification/
-│  │  ├─ decision.py
-│  │  ├─ llm_adapter.py
-│  │  ├─ model.py
-│  │  ├─ preprocess.py
-│  │  └─ sklearn_model.py
-│  ├─ context/
-│  │  └─ resolver.py
-│  ├─ logging/
-│     ├─ context_log.py
-│     └─ inference_log.py
-├─ orchestration/
-│  ├─ context_resolver.py
-│  └─ mcp_client.py
-├─ utils/
-│  └─ validators.py
-├─ data/
-│  ├─ mcp/
-│  │  ├─ history.json
-│  │  ├─ policies.json
-│  │  └─ taxonomy.json
-│  ├─ processed/
-│  ├─ raw/
-│  └─ samples/
-│     └─ training_data.json
-├─ docs/
-│  └─ TECH_DEBT.md
-├─ logs/
-├─ mcp_servers/
-│  ├─ history_server/
-│  │  ├─ server.py
-│  │  └─ data/
-│  │     └─ labels.csv
-│  ├─ policy_server/
-│  │  ├─ server.py
-│  │  └─ data/
-│  │     └─ rules.yaml
-│  └─ taxonomy_server/
-│     ├─ server.py
-│     └─ data/
-├─ models/
-│  └─ trained_pipeline.joblib
-├─ scripts/
-│  ├─ evaluate.py
-│  ├─ seed_data.py
-│  └─ train_model.py
-├─ tests/
-│  ├─ conftest.py
-│  ├─ test_api.py
-│  ├─ test_classification.py
-│  ├─ test_context_resolution.py
-│  └─ test_mcp_servers.py
-└─ ui/
-   ├─ static/
-   │   ├─ style.css
-   │   └─ script.js
-   └─ templates/
-       └─ index.html

 license: mit
 ---
+# Context-aware NLP Classification Platform with MCP
+## Overview
+This repository implements a **context-aware NLP classification platform** that combines a lightweight TF-IDF + Logistic Regression baseline with optional **LLM-assisted context re-ranking** via MCP (Managed Context Platform). It supports multi-domain classification (finance, HR, legal), structured context resolution, logging, and evaluation.
+The platform is modular and can run either **locally in a virtual environment** or inside a **Docker container**.
+---
+## Repository Structure
+```
+Dockerfile
+LICENSE
+README.md
+requirements-dev.txt
+requirements.txt
+app/
+  config.py              # Configuration and settings
+  logging_config.py      # Logging configuration
+  main.py                # Main entry point for API server
+  api/
+    routes.py            # FastAPI routes
+    schemas.py           # Pydantic schemas
+  classification/
+    decision.py          # Classification decision & abstention logic
+    llm_adapter.py       # Optional LLM integration for context
+    model.py             # Abstract classifier orchestration
+    preprocess.py        # Text preprocessing and tokenization
+    sklearn_model.py     # TF-IDF + Logistic Regression classifier
+  context/
+    resolver.py          # Context resolution logic
+  logging/
+    context_log.py       # Context logging to JSON
+    inference_log.py     # Inference logging to JSON
+orchestration/
+  context_resolver.py    # MCP-based structured context orchestration
+  mcp_client.py          # MCP server communication utilities
+utils/
+  validators.py          # Metadata validation utilities
+data/
+  samples/
+    train.json           # Training samples (small dataset)
+    eval.json            # Evaluation samples
+    training_data.json   # Full training dataset
+docs/
+  TECH_DEBT.md           # Technical debt documentation
+logs/                     # Runtime logs
+mcp_servers/
+  history_server/         # Historical label MCP server
+    server.py
+    data/labels.csv
+  policy_server/          # Policy MCP server
+    server.py
+    data/rules.yaml
+  taxonomy_server/        # Taxonomy MCP server
+    server.py
+    data/taxonomy.sqlite
+models/
+  trained_pipeline.joblib # Trained sklearn model pipeline
+scripts/
+  evaluate.py             # Offline evaluation script
+  populate_taxonomy.py    # Populate taxonomy.sqlite for MCP
+  seed_data.py            # Seed initial data into MCP files
+  train_model.py          # Train sklearn model from JSON dataset
+tests/
+  conftest.py             # Pytest configuration
+  test_api.py             # API endpoint tests
+  test_classification.py  # Classification module tests
+  test_context_resolution.py  # Context resolver tests
+  test_mcp_servers.py     # MCP server tests
+ui/
+  static/
+    script.js             # Frontend JS
+    style.css             # Frontend CSS
+  templates/
+    index.html            # Frontend template
+```
+---
+## Installation (Local)
+### 1. Clone the repository
+```bash
+git clone https://github.com/LeonardoMdSACode/Context-aware-NLP-classification-platform-with-MCP.git
+cd Context-aware-NLP-classification-platform-with-MCP
+```
+### 2. Create virtual environment
+```bash
+python -m venv venv
+source venv/bin/activate   # Linux/macOS
+venv\Scripts\activate     # Windows
+```
+### 3. Install dependencies
+```bash
+pip install -r requirements.txt
+pip install -r requirements-dev.txt  # for testing and development
+```
+### 4. Populate MCP Taxonomy (first time setup)
+```bash
+python scripts/populate_taxonomy.py
+```
+This populates `mcp_servers/taxonomy_server/data/taxonomy.sqlite`.
+### 5. Train the model
+```bash
+python scripts/train_model.py
+```
+This trains the TF-IDF + Logistic Regression model and saves it to `models/trained_pipeline.joblib`.
+### 6. Evaluate the model
+```bash
+python scripts/evaluate.py
+```
+Shows offline evaluation metrics (accuracy, precision, recall, F1-score).
+---
+## Running the API Locally
+### 1. Start the server
+```bash
+uvicorn app.main:app --reload
+```
+This runs the FastAPI server at `http://127.0.0.1:8000`.
+### 2. Run MCP embedded servers (if using embedded mode)
+Embedded MCP servers are started automatically via `app.orchestration.mcp_client.start_embedded_mcp_servers()`.
+### 3. Access UI
+Open your browser at `http://127.0.0.1:8000` to use the HTML/JS frontend.
+### 4. API Endpoints
+* `POST /classify` : Send `text` and optional `metadata` to get classification with context.
+* Swagger UI: `http://127.0.0.1:8000/docs`
+---
+## Testing
+### 1. Run all tests
+```bash
 pytest -v
+```
+### 2. Smoke Test
+* Run `test_backend.py` to ensure core API routes respond correctly.
+* Check MCP servers respond to `/resolve` endpoints.
+### 3. Module-specific tests
+* `test_classification.py` → validates `SklearnClassifier` and `LLMAdapter` predictions.
+* `test_context_resolution.py` → checks context resolver output.
+* `test_mcp_servers.py` → verifies taxonomy, policy, history MCP servers.
+---
+## How It Works
+### 1. Classification Layer
+* **Baseline:** `app/classification/sklearn_model.py` → TF-IDF + Logistic Regression
+* **LLM-assisted:** `app/classification/llm_adapter.py` → optional MCP context re-ranking
+* **Decision logic:** `app/classification/decision.py` → applies confidence, abstention, logging
+### 2. Context Resolution
+* **Embedded MCP mode:** `app/orchestration/context_resolver.py` loads JSON/SQLite local files
+* **Distributed MCP mode:** fetches context from taxonomy, policy, and history MCP servers
+* Logs all context resolution for auditability
+### 3. Logging
+* `app/logging/inference_log.py` → logs every prediction
+* `app/logging/context_log.py` → logs context used in classification
+* Logs stored as JSON in `logs/`
+### 4. MCP Servers
+* `taxonomy_server` → serves categories and descriptions from SQLite
+* `policy_server` → serves policy rules from YAML
+* `history_server` → serves historical label data from CSV
+* Communicated via HTTP endpoints
+### 5. Scripts
+* `train_model.py` → trains and saves the sklearn pipeline
+* `evaluate.py` → offline evaluation
+* `populate_taxonomy.py` → populates SQLite taxonomy
+* `seed_data.py` → seeds MCP JSON files
+### 6. Frontend UI
+* Simple interface in `ui/templates/index.html`
+* Uses JS (`static/script.js`) to call `/classify` endpoint
+* Styled via `static/style.css`
+---
+## Recommendations
+* Use a **larger, more diverse dataset** for real-world deployment to avoid overfitting
+* Use **sigmoid calibration** for realistic confidence scores
+* Keep logs for **auditability** and context traceability
+* Run tests regularly with `pytest -v` to ensure stability
+---
+## References / Docs
+* `docs/TECH_DEBT.md` → Technical debt notes and improvement suggestions
+* `data/samples/` → Sample training/evaluation datasets
+* `models/trained_pipeline.joblib` → Pretrained baseline model
+---
+## Contact / Author
+Repository: [LeonardoMdSACode / Context-aware-NLP-classification-platform-with-MCP](https://github.com/LeonardoMdSACode/Context-aware-NLP-classification-platform-with-MCP)
+---
+## MIT License
+This project is licensed under the MIT License. See the LICENSE file for details.

app/models/trained_pipeline.joblib DELETED Viewed

Binary file (5.91 kB)

models/trained_pipeline.joblib CHANGED Viewed

Binary files a/models/trained_pipeline.joblib and b/models/trained_pipeline.joblib differ