Spaces:
Running
Running
| # NLProxy Documentation Overview | |
| This document provides a high-level architecture map for the `nlproxy` project and points to package-level documentation modules. | |
| ## Purpose | |
| NLProxy is an enterprise-grade prompt compression and LLM proxy SDK. It is built to reduce prompt cost, enforce prompt safety, and orchestrate multiple LLM providers with caching, firewall controls, and production-ready FastAPI deployment. | |
| ## Architecture | |
| The repository is organized into the following logical modules: | |
| - `core/` | |
| - Implements the compression pipeline, safety checks, and prompt reshaping logic. | |
| - `llm/` | |
| - Orchestrates multi-provider LLM interaction with retry, rate limiting, circuit breaking, and fallback. | |
| - `firewall/` | |
| - Detects prompt injection and jailbreak attacks using regex and semantic matching. | |
| - `cache/` | |
| - Provides Redis-based semantic caching for repeated prompt responses. | |
| - `service/` | |
| - Coordinates full prompt compression workflow and thread pool execution. | |
| - `server/` | |
| - Contains the FastAPI application, API routes, dependency initialization, and lifecycle management. | |
| - `cli/` | |
| - Includes CLI commands for starting the server, compressing prompts, downloading models, and running tests. | |
| - `utils/` | |
| - Shares constants and logging configuration across the project. | |
| ## Documentation Modules | |
| Each package has dedicated reference documentation in the `docs/` directory: | |
| - `docs/cache.md` | |
| - `docs/cli.md` | |
| - `docs/core.md` | |
| - `docs/firewall.md` | |
| - `docs/llm.md` | |
| - `docs/server.md` | |
| - `docs/service.md` | |
| - `docs/utils.md` | |
| - `docs/tests.md` | |
| ## Deployment Notes | |
| - The FastAPI application is created in `server/main.py` and exposed as `app`. | |
| - Configuration is loaded from environment variables with the `NLPROXY_` prefix in `server/config.py`. | |
| - The `CLI` module includes `runserver`, `download_models`, and `compress` commands for operational control. | |
| - Local model files must be provisioned in `nlproxy/models/` or by running the model downloader. | |
| ## Runtime Flow | |
| 1. Server startup triggers `server/dependencies.startup()`. | |
| 2. The compression pipeline is initialized in `service/compression.py`. | |
| 3. API requests are handled by `server/apis/chat.py`. | |
| 4. Prompts pass through firewall analysis, compression, LLM generation, and post-LLM verification. | |
| 5. Observability data is exposed on `/metrics` and `/health` paths. | |
| ## Conventions | |
| - Configuration values are strongly typed using Pydantic models. | |
| - The service avoids automatic external model downloads at runtime; the downloader CLI is used instead. | |
| - Shared constants are centralized under `utils/constants.py`. | |
| - Caching and external services use connection pooling and predictable timeout settings. | |