OkeyMetaLtd
/

Reframr-RFM-v2-Base

Text Generation

non-transformer

recurrent-memory

computed-weights

source-grounded

Model card Files Files and versions

Reframr-RFM-v2-Base / docs /openai_compat.md

OkeyMeta's picture

Add-openai-compatible-runtime-docs

5348cd5 verified 1 day ago

|

history blame contribute delete

2.93 kB

	# Reframr OpenAI-Compatible Runtime

	Reframr v3 runtime work includes an OpenAI-style adapter so apps can plug Reframr into existing chat, support, and tool orchestration systems without writing custom prompt glue.

	## Chat Completion

	```python
	from pathlib import Path

	from reframr import ReframrModel, build_chat_completion_response

	model = ReframrModel.load(Path("model.safetensors"))

	response = build_chat_completion_response(
	model,
	{
	"model": "reframr-v3",
	"messages": [
	{"role": "system", "content": "Be concise and cite sources when tool results are provided."},
	{"role": "user", "content": "Summarize this customer support issue."},
	],
	"max_tokens": 160,
	"temperature": 0.58,
	},
	)

	print(response["choices"][0]["message"]["content"])
	```

	## Streaming

	```python
	from reframr.openai_compat import iter_sse_chat_completion

	for event in iter_sse_chat_completion(model, request):
	send_to_browser(event)
	```

	The stream emits OpenAI-style `chat.completion.chunk` SSE events and ends with:

	```text
	data: [DONE]
	```

	## Tool Loop

	Register real tools in the host application. Reframr can request a tool with `<tool_call>`, the host executes the function, and the result is fed back as `<tool_result>` / `<source>` evidence.

	```python
	from reframr.openai_compat import run_tool_loop

	def web_search(arguments: dict[str, object]) -> dict[str, object]:
	query = str(arguments["query"])
	result = your_search_client.search(query)
	return {
	"ok": True,
	"source": {
	"title": result.title,
	"url": result.url,
	"snippet": result.snippet,
	},
	}

	response = run_tool_loop(
	model,
	{
	"model": "reframr-v3",
	"messages": [
	{"role": "user", "content": "What changed in the latest official release notes?"}
	],
	},
	tools={"web.search": web_search},
	max_rounds=3,
	)
	```

	If a tool is missing or fails, the adapter sends the failure back as a tool result instead of crashing. That lets Reframr answer honestly, retry with a different tool if the model requests one, or ask the user for source evidence.

	## CLI

	```bash
	python -m reframr chat-completion --model model.safetensors < request.json
	```

	For SSE output:

	```json
	{
	"model": "reframr-v3",
	"stream": true,
	"messages": [
	{"role": "user", "content": "Write a short support reply."}
	]
	}
	```

	## Deployment Notes

	- Keep real tools outside the model runtime and pass their outputs back as data.
	- Treat source quality as part of the product: validate URLs, timestamps, permissions, and user access.
	- Do not let the model fabricate tool results. If no tool result exists for a fresh fact, the app should ask for retrieval or return an uncertainty-aware answer.
	- Use `session_id` with `python -m reframr serve` when you want conversation memory in the JSONL server.