Reframr-RFM-v2-Base / docs /openai_compat.md
OkeyMeta's picture
Add-openai-compatible-runtime-docs
5348cd5 verified
# Reframr OpenAI-Compatible Runtime
Reframr v3 runtime work includes an OpenAI-style adapter so apps can plug Reframr into existing chat, support, and tool orchestration systems without writing custom prompt glue.
## Chat Completion
```python
from pathlib import Path
from reframr import ReframrModel, build_chat_completion_response
model = ReframrModel.load(Path("model.safetensors"))
response = build_chat_completion_response(
model,
{
"model": "reframr-v3",
"messages": [
{"role": "system", "content": "Be concise and cite sources when tool results are provided."},
{"role": "user", "content": "Summarize this customer support issue."},
],
"max_tokens": 160,
"temperature": 0.58,
},
)
print(response["choices"][0]["message"]["content"])
```
## Streaming
```python
from reframr.openai_compat import iter_sse_chat_completion
for event in iter_sse_chat_completion(model, request):
send_to_browser(event)
```
The stream emits OpenAI-style `chat.completion.chunk` SSE events and ends with:
```text
data: [DONE]
```
## Tool Loop
Register real tools in the host application. Reframr can request a tool with `<tool_call>`, the host executes the function, and the result is fed back as `<tool_result>` / `<source>` evidence.
```python
from reframr.openai_compat import run_tool_loop
def web_search(arguments: dict[str, object]) -> dict[str, object]:
query = str(arguments["query"])
result = your_search_client.search(query)
return {
"ok": True,
"source": {
"title": result.title,
"url": result.url,
"snippet": result.snippet,
},
}
response = run_tool_loop(
model,
{
"model": "reframr-v3",
"messages": [
{"role": "user", "content": "What changed in the latest official release notes?"}
],
},
tools={"web.search": web_search},
max_rounds=3,
)
```
If a tool is missing or fails, the adapter sends the failure back as a tool result instead of crashing. That lets Reframr answer honestly, retry with a different tool if the model requests one, or ask the user for source evidence.
## CLI
```bash
python -m reframr chat-completion --model model.safetensors < request.json
```
For SSE output:
```json
{
"model": "reframr-v3",
"stream": true,
"messages": [
{"role": "user", "content": "Write a short support reply."}
]
}
```
## Deployment Notes
- Keep real tools outside the model runtime and pass their outputs back as data.
- Treat source quality as part of the product: validate URLs, timestamps, permissions, and user access.
- Do not let the model fabricate tool results. If no tool result exists for a fresh fact, the app should ask for retrieval or return an uncertainty-aware answer.
- Use `session_id` with `python -m reframr serve` when you want conversation memory in the JSONL server.