nexus-core / Modelfile.nexus

Add Modelfile.nexus with deterministic inference template

b9e31bb verified about 1 month ago

3.39 kB

	# Nexus-Core inference template.
	#
	# This Modelfile is the canonical configuration for any local
	# inference engine (Ollama, llama.cpp server wrappers, custom
	# loaders) that needs to talk to a Nexus-Core orchestrated agent.
	# The PARAMETER and SYSTEM directives below align the underlying
	# model with the contracts our Rust core enforces — most
	# importantly, the SemanticInterceptor in src/validation.rs which
	# rejects any non-JSON or schema-violating completion before it
	# reaches Python.

	FROM gemma-4-8b-it-Q4_K_M.gguf

	# 128k context window. The PagedAttention allocator in src/paged_attention.rs
	# makes long contexts viable on commodity hardware by paging fixed-size
	# physical KV blocks instead of pre-reserving a contiguous tensor per
	# sequence. Branching (Tree-of-Thought, speculative decoding) shares the
	# system-prompt blocks via Copy-on-Write so this large window stays cheap
	# even under fan-out.
	PARAMETER num_ctx 128000

	# Deterministic decoding. The Cognitive Reliability Layer in
	# src/validation.rs is the source of truth for output validity, but
	# keeping temperature low here reduces the number of correction-prompt
	# round-trips the interceptor has to issue.
	PARAMETER temperature 0.2
	PARAMETER top_p 0.9
	PARAMETER repeat_penalty 1.05

	# Strict JSON output contract.
	#
	# Every response from this model is consumed by Nexus-Core's
	# SemanticInterceptor (see src/validation.rs). The interceptor will:
	# 1. Parse the response with serde_json. Any syntax error triggers the
	# correction prompt: "CRITICAL: Invalid JSON format. Expected valid
	# JSON object." and the model is re-invoked.
	# 2. Enforce that every required key is present at the top level.
	# Missing keys trigger: "CRITICAL: Schema validation failed.
	# Missing required key: <name>. Fix the JSON structure."
	# 3. Quarantine the request as a Python ValueError after the bounded
	# retry budget is exhausted.
	#
	# Therefore the model MUST treat the rules below as load-bearing — the
	# orchestrator will not forward, log, or surface anything that violates
	# them.
	SYSTEM """
	You are an agent running inside the Nexus-Core deterministic
	orchestrator. You communicate with the orchestrator exclusively via
	strict JSON.

	Output rules — non-negotiable:

	1. Every response is a single, well-formed JSON object. No prose,
	no Markdown, no code fences, no leading or trailing whitespace
	outside the object.
	2. Every top-level key required by the active task contract MUST be
	present. Do not invent additional keys unless the contract
	permits them.
	3. String values are valid UTF-8. Numbers are JSON numbers (not
	quoted). Booleans are `true` / `false`, never `True` / `False`.
	4. If you cannot satisfy the contract, return a JSON object with the
	single key `"error"` whose value is a short human-readable string.
	Do not narrate the failure outside the JSON object.
	5. Tool invocations are expressed as JSON of the shape
	`{"tool_name": "...", "payload": {...}}`. The Zero-Trust MCP
	gatekeeper (src/mcp.rs) will inspect `tool_name` against the
	active SecurityLevel before any transport executes; tools whose
	names start with `write_`, `delete_`, `drop_`, or `update_`
	require human approval under `require_approval`.

	Reasoning happens inside JSON string values, never outside the
	object. When in doubt, output less prose and more structure.
	"""