NLProxy / nlproxy /docs /overview.md
Luiserb's picture
first commit
2129c29
|
Raw
History Blame Contribute Delete
2.68 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

NLProxy Documentation Overview

This document provides a high-level architecture map for the nlproxy project and points to package-level documentation modules.

Purpose

NLProxy is an enterprise-grade prompt compression and LLM proxy SDK. It is built to reduce prompt cost, enforce prompt safety, and orchestrate multiple LLM providers with caching, firewall controls, and production-ready FastAPI deployment.

Architecture

The repository is organized into the following logical modules:

  • core/
    • Implements the compression pipeline, safety checks, and prompt reshaping logic.
  • llm/
    • Orchestrates multi-provider LLM interaction with retry, rate limiting, circuit breaking, and fallback.
  • firewall/
    • Detects prompt injection and jailbreak attacks using regex and semantic matching.
  • cache/
    • Provides Redis-based semantic caching for repeated prompt responses.
  • service/
    • Coordinates full prompt compression workflow and thread pool execution.
  • server/
    • Contains the FastAPI application, API routes, dependency initialization, and lifecycle management.
  • cli/
    • Includes CLI commands for starting the server, compressing prompts, downloading models, and running tests.
  • utils/
    • Shares constants and logging configuration across the project.

Documentation Modules

Each package has dedicated reference documentation in the docs/ directory:

  • docs/cache.md
  • docs/cli.md
  • docs/core.md
  • docs/firewall.md
  • docs/llm.md
  • docs/server.md
  • docs/service.md
  • docs/utils.md
  • docs/tests.md

Deployment Notes

  • The FastAPI application is created in server/main.py and exposed as app.
  • Configuration is loaded from environment variables with the NLPROXY_ prefix in server/config.py.
  • The CLI module includes runserver, download_models, and compress commands for operational control.
  • Local model files must be provisioned in nlproxy/models/ or by running the model downloader.

Runtime Flow

  1. Server startup triggers server/dependencies.startup().
  2. The compression pipeline is initialized in service/compression.py.
  3. API requests are handled by server/apis/chat.py.
  4. Prompts pass through firewall analysis, compression, LLM generation, and post-LLM verification.
  5. Observability data is exposed on /metrics and /health paths.

Conventions

  • Configuration values are strongly typed using Pydantic models.
  • The service avoids automatic external model downloads at runtime; the downloader CLI is used instead.
  • Shared constants are centralized under utils/constants.py.
  • Caching and external services use connection pooling and predictable timeout settings.