Spaces:

IntelliDeep
/

NLProxy

Running

App Files Files Community

NLProxy / nlproxy /docs /overview.md

Luiserb

first commit

2129c29 15 days ago

preview code

Raw

History Blame Contribute Delete

2.68 kB

	# NLProxy Documentation Overview

	This document provides a high-level architecture map for the `nlproxy` project and points to package-level documentation modules.

	## Purpose

	NLProxy is an enterprise-grade prompt compression and LLM proxy SDK. It is built to reduce prompt cost, enforce prompt safety, and orchestrate multiple LLM providers with caching, firewall controls, and production-ready FastAPI deployment.

	## Architecture

	The repository is organized into the following logical modules:

	- `core/`
	- Implements the compression pipeline, safety checks, and prompt reshaping logic.
	- `llm/`
	- Orchestrates multi-provider LLM interaction with retry, rate limiting, circuit breaking, and fallback.
	- `firewall/`
	- Detects prompt injection and jailbreak attacks using regex and semantic matching.
	- `cache/`
	- Provides Redis-based semantic caching for repeated prompt responses.
	- `service/`
	- Coordinates full prompt compression workflow and thread pool execution.
	- `server/`
	- Contains the FastAPI application, API routes, dependency initialization, and lifecycle management.
	- `cli/`
	- Includes CLI commands for starting the server, compressing prompts, downloading models, and running tests.
	- `utils/`
	- Shares constants and logging configuration across the project.

	## Documentation Modules

	Each package has dedicated reference documentation in the `docs/` directory:

	- `docs/cache.md`
	- `docs/cli.md`
	- `docs/core.md`
	- `docs/firewall.md`
	- `docs/llm.md`
	- `docs/server.md`
	- `docs/service.md`
	- `docs/utils.md`
	- `docs/tests.md`

	## Deployment Notes

	- The FastAPI application is created in `server/main.py` and exposed as `app`.
	- Configuration is loaded from environment variables with the `NLPROXY_` prefix in `server/config.py`.
	- The `CLI` module includes `runserver`, `download_models`, and `compress` commands for operational control.
	- Local model files must be provisioned in `nlproxy/models/` or by running the model downloader.

	## Runtime Flow

	1. Server startup triggers `server/dependencies.startup()`.
	2. The compression pipeline is initialized in `service/compression.py`.
	3. API requests are handled by `server/apis/chat.py`.
	4. Prompts pass through firewall analysis, compression, LLM generation, and post-LLM verification.
	5. Observability data is exposed on `/metrics` and `/health` paths.

	## Conventions

	- Configuration values are strongly typed using Pydantic models.
	- The service avoids automatic external model downloads at runtime; the downloader CLI is used instead.
	- Shared constants are centralized under `utils/constants.py`.
	- Caching and external services use connection pooling and predictable timeout settings.