Spaces:

IntelliDeep
/

NLProxy

Running

App Files Files Community

NLProxy / nlproxy /docs /overview.md

Luiserb

first commit

2129c29 15 days ago

preview code

Raw

History Blame Contribute Delete

2.68 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

NLProxy Documentation Overview

This document provides a high-level architecture map for the nlproxy project and points to package-level documentation modules.

Purpose

NLProxy is an enterprise-grade prompt compression and LLM proxy SDK. It is built to reduce prompt cost, enforce prompt safety, and orchestrate multiple LLM providers with caching, firewall controls, and production-ready FastAPI deployment.

Architecture

The repository is organized into the following logical modules:

core/
- Implements the compression pipeline, safety checks, and prompt reshaping logic.
llm/
- Orchestrates multi-provider LLM interaction with retry, rate limiting, circuit breaking, and fallback.
firewall/
- Detects prompt injection and jailbreak attacks using regex and semantic matching.
cache/
- Provides Redis-based semantic caching for repeated prompt responses.
service/
- Coordinates full prompt compression workflow and thread pool execution.
server/
- Contains the FastAPI application, API routes, dependency initialization, and lifecycle management.
cli/
- Includes CLI commands for starting the server, compressing prompts, downloading models, and running tests.
utils/
- Shares constants and logging configuration across the project.

Documentation Modules

Each package has dedicated reference documentation in the docs/ directory:

docs/cache.md
docs/cli.md
docs/core.md
docs/firewall.md
docs/llm.md
docs/server.md
docs/service.md
docs/utils.md
docs/tests.md

Deployment Notes

The FastAPI application is created in server/main.py and exposed as app.
Configuration is loaded from environment variables with the NLPROXY_ prefix in server/config.py.
The CLI module includes runserver, download_models, and compress commands for operational control.
Local model files must be provisioned in nlproxy/models/ or by running the model downloader.

Runtime Flow

Server startup triggers server/dependencies.startup().
The compression pipeline is initialized in service/compression.py.
API requests are handled by server/apis/chat.py.
Prompts pass through firewall analysis, compression, LLM generation, and post-LLM verification.
Observability data is exposed on /metrics and /health paths.

Conventions

Configuration values are strongly typed using Pydantic models.
The service avoids automatic external model downloads at runtime; the downloader CLI is used instead.
Shared constants are centralized under utils/constants.py.
Caching and external services use connection pooling and predictable timeout settings.