| # Reframr OpenAI-Compatible Runtime |
|
|
| Reframr v3 runtime work includes an OpenAI-style adapter so apps can plug Reframr into existing chat, support, and tool orchestration systems without writing custom prompt glue. |
|
|
| ## Chat Completion |
|
|
| ```python |
| from pathlib import Path |
| |
| from reframr import ReframrModel, build_chat_completion_response |
| |
| model = ReframrModel.load(Path("model.safetensors")) |
| |
| response = build_chat_completion_response( |
| model, |
| { |
| "model": "reframr-v3", |
| "messages": [ |
| {"role": "system", "content": "Be concise and cite sources when tool results are provided."}, |
| {"role": "user", "content": "Summarize this customer support issue."}, |
| ], |
| "max_tokens": 160, |
| "temperature": 0.58, |
| }, |
| ) |
| |
| print(response["choices"][0]["message"]["content"]) |
| ``` |
|
|
| ## Streaming |
|
|
| ```python |
| from reframr.openai_compat import iter_sse_chat_completion |
| |
| for event in iter_sse_chat_completion(model, request): |
| send_to_browser(event) |
| ``` |
|
|
| The stream emits OpenAI-style `chat.completion.chunk` SSE events and ends with: |
|
|
| ```text |
| data: [DONE] |
| ``` |
|
|
| ## Tool Loop |
|
|
| Register real tools in the host application. Reframr can request a tool with `<tool_call>`, the host executes the function, and the result is fed back as `<tool_result>` / `<source>` evidence. |
|
|
| ```python |
| from reframr.openai_compat import run_tool_loop |
| |
| def web_search(arguments: dict[str, object]) -> dict[str, object]: |
| query = str(arguments["query"]) |
| result = your_search_client.search(query) |
| return { |
| "ok": True, |
| "source": { |
| "title": result.title, |
| "url": result.url, |
| "snippet": result.snippet, |
| }, |
| } |
| |
| response = run_tool_loop( |
| model, |
| { |
| "model": "reframr-v3", |
| "messages": [ |
| {"role": "user", "content": "What changed in the latest official release notes?"} |
| ], |
| }, |
| tools={"web.search": web_search}, |
| max_rounds=3, |
| ) |
| ``` |
|
|
| If a tool is missing or fails, the adapter sends the failure back as a tool result instead of crashing. That lets Reframr answer honestly, retry with a different tool if the model requests one, or ask the user for source evidence. |
|
|
| ## CLI |
|
|
| ```bash |
| python -m reframr chat-completion --model model.safetensors < request.json |
| ``` |
|
|
| For SSE output: |
|
|
| ```json |
| { |
| "model": "reframr-v3", |
| "stream": true, |
| "messages": [ |
| {"role": "user", "content": "Write a short support reply."} |
| ] |
| } |
| ``` |
|
|
| ## Deployment Notes |
|
|
| - Keep real tools outside the model runtime and pass their outputs back as data. |
| - Treat source quality as part of the product: validate URLs, timestamps, permissions, and user access. |
| - Do not let the model fabricate tool results. If no tool result exists for a fresh fact, the app should ask for retrieval or return an uncertainty-aware answer. |
| - Use `session_id` with `python -m reframr serve` when you want conversation memory in the JSONL server. |
|
|