Reframr-RFM-v2-Base / docs /openai_compat.md
OkeyMeta's picture
Add-openai-compatible-runtime-docs
5348cd5 verified

Reframr OpenAI-Compatible Runtime

Reframr v3 runtime work includes an OpenAI-style adapter so apps can plug Reframr into existing chat, support, and tool orchestration systems without writing custom prompt glue.

Chat Completion

from pathlib import Path

from reframr import ReframrModel, build_chat_completion_response

model = ReframrModel.load(Path("model.safetensors"))

response = build_chat_completion_response(
    model,
    {
        "model": "reframr-v3",
        "messages": [
            {"role": "system", "content": "Be concise and cite sources when tool results are provided."},
            {"role": "user", "content": "Summarize this customer support issue."},
        ],
        "max_tokens": 160,
        "temperature": 0.58,
    },
)

print(response["choices"][0]["message"]["content"])

Streaming

from reframr.openai_compat import iter_sse_chat_completion

for event in iter_sse_chat_completion(model, request):
    send_to_browser(event)

The stream emits OpenAI-style chat.completion.chunk SSE events and ends with:

data: [DONE]

Tool Loop

Register real tools in the host application. Reframr can request a tool with <tool_call>, the host executes the function, and the result is fed back as <tool_result> / <source> evidence.

from reframr.openai_compat import run_tool_loop

def web_search(arguments: dict[str, object]) -> dict[str, object]:
    query = str(arguments["query"])
    result = your_search_client.search(query)
    return {
        "ok": True,
        "source": {
            "title": result.title,
            "url": result.url,
            "snippet": result.snippet,
        },
    }

response = run_tool_loop(
    model,
    {
        "model": "reframr-v3",
        "messages": [
            {"role": "user", "content": "What changed in the latest official release notes?"}
        ],
    },
    tools={"web.search": web_search},
    max_rounds=3,
)

If a tool is missing or fails, the adapter sends the failure back as a tool result instead of crashing. That lets Reframr answer honestly, retry with a different tool if the model requests one, or ask the user for source evidence.

CLI

python -m reframr chat-completion --model model.safetensors < request.json

For SSE output:

{
  "model": "reframr-v3",
  "stream": true,
  "messages": [
    {"role": "user", "content": "Write a short support reply."}
  ]
}

Deployment Notes

  • Keep real tools outside the model runtime and pass their outputs back as data.
  • Treat source quality as part of the product: validate URLs, timestamps, permissions, and user access.
  • Do not let the model fabricate tool results. If no tool result exists for a fresh fact, the app should ask for retrieval or return an uncertainty-aware answer.
  • Use session_id with python -m reframr serve when you want conversation memory in the JSONL server.