Spaces:
Running
Running
A newer version of the Gradio SDK is available: 6.19.0
NLProxy Documentation Overview
This document provides a high-level architecture map for the nlproxy project and points to package-level documentation modules.
Purpose
NLProxy is an enterprise-grade prompt compression and LLM proxy SDK. It is built to reduce prompt cost, enforce prompt safety, and orchestrate multiple LLM providers with caching, firewall controls, and production-ready FastAPI deployment.
Architecture
The repository is organized into the following logical modules:
core/- Implements the compression pipeline, safety checks, and prompt reshaping logic.
llm/- Orchestrates multi-provider LLM interaction with retry, rate limiting, circuit breaking, and fallback.
firewall/- Detects prompt injection and jailbreak attacks using regex and semantic matching.
cache/- Provides Redis-based semantic caching for repeated prompt responses.
service/- Coordinates full prompt compression workflow and thread pool execution.
server/- Contains the FastAPI application, API routes, dependency initialization, and lifecycle management.
cli/- Includes CLI commands for starting the server, compressing prompts, downloading models, and running tests.
utils/- Shares constants and logging configuration across the project.
Documentation Modules
Each package has dedicated reference documentation in the docs/ directory:
docs/cache.mddocs/cli.mddocs/core.mddocs/firewall.mddocs/llm.mddocs/server.mddocs/service.mddocs/utils.mddocs/tests.md
Deployment Notes
- The FastAPI application is created in
server/main.pyand exposed asapp. - Configuration is loaded from environment variables with the
NLPROXY_prefix inserver/config.py. - The
CLImodule includesrunserver,download_models, andcompresscommands for operational control. - Local model files must be provisioned in
nlproxy/models/or by running the model downloader.
Runtime Flow
- Server startup triggers
server/dependencies.startup(). - The compression pipeline is initialized in
service/compression.py. - API requests are handled by
server/apis/chat.py. - Prompts pass through firewall analysis, compression, LLM generation, and post-LLM verification.
- Observability data is exposed on
/metricsand/healthpaths.
Conventions
- Configuration values are strongly typed using Pydantic models.
- The service avoids automatic external model downloads at runtime; the downloader CLI is used instead.
- Shared constants are centralized under
utils/constants.py. - Caching and external services use connection pooling and predictable timeout settings.