Spaces:

abc1181
/

HermesAgent

Paused

App Files Files Community

HermesAgent / website /docs /developer-guide /context-engine-plugin.md

cjovs

Upload 1680 files

b5b9c2e verified about 1 month ago

preview code

raw

history blame contribute delete

6.92 kB

	---
	sidebar_position: 9
	title: "Context Engine Plugins"
	description: "How to build a context engine plugin that replaces the built-in ContextCompressor"
	---

	# Building a Context Engine Plugin

	Context engine plugins replace the built-in `ContextCompressor` with an alternative strategy for managing conversation context. For example, a Lossless Context Management (LCM) engine that builds a knowledge DAG instead of lossy summarization.

	## How it works

	The agent's context management is built on the `ContextEngine` ABC (`agent/context_engine.py`). The built-in `ContextCompressor` is the default implementation. Plugin engines must implement the same interface.

	Only one context engine can be active at a time. Selection is config-driven:

	```yaml
	# config.yaml
	context:
	engine: "compressor" # default built-in
	engine: "lcm" # activates a plugin engine named "lcm"
	```

	Plugin engines are never auto-activated — the user must explicitly set `context.engine` to the plugin's name.

	## Directory structure

	Each context engine lives in `plugins/context_engine/<name>/`:

	```
	plugins/context_engine/lcm/
	├── __init__.py # exports the ContextEngine subclass
	├── plugin.yaml # metadata (name, description, version)
	└── ... # any other modules your engine needs
	```

	## The ContextEngine ABC

	Your engine must implement these required methods:

	```python
	from agent.context_engine import ContextEngine

	class LCMEngine(ContextEngine):

	@property
	def name(self) -> str:
	"""Short identifier, e.g. 'lcm'. Must match config.yaml value."""
	return "lcm"

	def update_from_response(self, usage: dict) -> None:
	"""Called after every LLM call with the usage dict.

	Update self.last_prompt_tokens, self.last_completion_tokens,
	self.last_total_tokens from the response.
	"""

	def should_compress(self, prompt_tokens: int = None) -> bool:
	"""Return True if compaction should fire this turn."""

	def compress(self, messages: list, current_tokens: int = None) -> list:
	"""Compact the message list and return a new (possibly shorter) list.

	The returned list must be a valid OpenAI-format message sequence.
	"""
	```

	### Class attributes your engine must maintain

	The agent reads these directly for display and logging:

	```python
	last_prompt_tokens: int = 0
	last_completion_tokens: int = 0
	last_total_tokens: int = 0
	threshold_tokens: int = 0 # when compression triggers
	context_length: int = 0 # model's full context window
	compression_count: int = 0 # how many times compress() has run
	```

	### Optional methods

	These have sensible defaults in the ABC. Override as needed:

	\| Method \| Default \| Override when \|
	\|--------\|---------\|--------------\|
	\| `on_session_start(session_id, **kwargs)` \| No-op \| You need to load persisted state (DAG, DB) \|
	\| `on_session_end(session_id, messages)` \| No-op \| You need to flush state, close connections \|
	\| `on_session_reset()` \| Resets token counters \| You have per-session state to clear \|
	\| `update_model(model, context_length, ...)` \| Updates context_length + threshold \| You need to recalculate budgets on model switch \|
	\| `get_tool_schemas()` \| Returns `[]` \| Your engine provides agent-callable tools (e.g., `lcm_grep`) \|
	\| `handle_tool_call(name, args, **kwargs)` \| Returns error JSON \| You implement tool handlers \|
	\| `should_compress_preflight(messages)` \| Returns `False` \| You can do a cheap pre-API-call estimate \|
	\| `get_status()` \| Standard token/threshold dict \| You have custom metrics to expose \|

	## Engine tools

	Context engines can expose tools the agent calls directly. Return schemas from `get_tool_schemas()` and handle calls in `handle_tool_call()`:

	```python
	def get_tool_schemas(self):
	return [{
	"name": "lcm_grep",
	"description": "Search the context knowledge graph",
	"parameters": {
	"type": "object",
	"properties": {
	"query": {"type": "string", "description": "Search query"}
	},
	"required": ["query"],
	},
	}]

	def handle_tool_call(self, name, args, **kwargs):
	if name == "lcm_grep":
	results = self._search_dag(args["query"])
	return json.dumps({"results": results})
	return json.dumps({"error": f"Unknown tool: {name}"})
	```

	Engine tools are injected into the agent's tool list at startup and dispatched automatically — no registry registration needed.

	## Registration

	### Via directory (recommended)

	Place your engine in `plugins/context_engine/<name>/`. The `__init__.py` must export a `ContextEngine` subclass. The discovery system finds and instantiates it automatically.

	### Via general plugin system

	A general plugin can also register a context engine:

	```python
	def register(ctx):
	engine = LCMEngine(context_length=200000)
	ctx.register_context_engine(engine)
	```

	Only one engine can be registered. A second plugin attempting to register is rejected with a warning.

	## Lifecycle

	```
	1. Engine instantiated (plugin load or directory discovery)
	2. on_session_start() — conversation begins
	3. update_from_response() — after each API call
	4. should_compress() — checked each turn
	5. compress() — called when should_compress() returns True
	6. on_session_end() — session boundary (CLI exit, /reset, gateway expiry)
	```

	`on_session_reset()` is called on `/new` or `/reset` to clear per-session state without a full shutdown.

	## Configuration

	Users select your engine via `hermes plugins` → Provider Plugins → Context Engine, or by editing `config.yaml`:

	```yaml
	context:
	engine: "lcm" # must match your engine's name property
	```

	The `compression` config block (`compression.threshold`, `compression.protect_last_n`, etc.) is specific to the built-in `ContextCompressor`. Your engine should define its own config format if needed, reading from `config.yaml` during initialization.

	## Testing

	```python
	from agent.context_engine import ContextEngine

	def test_engine_satisfies_abc():
	engine = YourEngine(context_length=200000)
	assert isinstance(engine, ContextEngine)
	assert engine.name == "your-name"

	def test_compress_returns_valid_messages():
	engine = YourEngine(context_length=200000)
	msgs = [{"role": "user", "content": "hello"}]
	result = engine.compress(msgs)
	assert isinstance(result, list)
	assert all("role" in m for m in result)
	```

	See `tests/agent/test_context_engine.py` for the full ABC contract test suite.

	## See also

	- [Context Compression and Caching](/docs/developer-guide/context-compression-and-caching) — how the built-in compressor works
	- [Memory Provider Plugins](/docs/developer-guide/memory-provider-plugin) — analogous single-select plugin system for memory
	- [Plugins](/docs/user-guide/features/plugins) — general plugin system overview