File size: 2,931 Bytes

5348cd5

# Reframr OpenAI-Compatible Runtime

Reframr v3 runtime work includes an OpenAI-style adapter so apps can plug Reframr into existing chat, support, and tool orchestration systems without writing custom prompt glue.

## Chat Completion

```python
from pathlib import Path

from reframr import ReframrModel, build_chat_completion_response

model = ReframrModel.load(Path("model.safetensors"))

response = build_chat_completion_response(
    model,
    {
        "model": "reframr-v3",
        "messages": [
            {"role": "system", "content": "Be concise and cite sources when tool results are provided."},
            {"role": "user", "content": "Summarize this customer support issue."},
        ],
        "max_tokens": 160,
        "temperature": 0.58,
    },
)

print(response["choices"][0]["message"]["content"])
```

## Streaming

```python
from reframr.openai_compat import iter_sse_chat_completion

for event in iter_sse_chat_completion(model, request):
    send_to_browser(event)
```

The stream emits OpenAI-style `chat.completion.chunk` SSE events and ends with:

```text
data: [DONE]
```

## Tool Loop

Register real tools in the host application. Reframr can request a tool with `<tool_call>`, the host executes the function, and the result is fed back as `<tool_result>` / `<source>` evidence.

```python
from reframr.openai_compat import run_tool_loop

def web_search(arguments: dict[str, object]) -> dict[str, object]:
    query = str(arguments["query"])
    result = your_search_client.search(query)
    return {
        "ok": True,
        "source": {
            "title": result.title,
            "url": result.url,
            "snippet": result.snippet,
        },
    }

response = run_tool_loop(
    model,
    {
        "model": "reframr-v3",
        "messages": [
            {"role": "user", "content": "What changed in the latest official release notes?"}
        ],
    },
    tools={"web.search": web_search},
    max_rounds=3,
)
```

If a tool is missing or fails, the adapter sends the failure back as a tool result instead of crashing. That lets Reframr answer honestly, retry with a different tool if the model requests one, or ask the user for source evidence.

## CLI

```bash
python -m reframr chat-completion --model model.safetensors < request.json
```

For SSE output:

```json
{
  "model": "reframr-v3",
  "stream": true,
  "messages": [
    {"role": "user", "content": "Write a short support reply."}
  ]
}
```

## Deployment Notes

- Keep real tools outside the model runtime and pass their outputs back as data.
- Treat source quality as part of the product: validate URLs, timestamps, permissions, and user access.
- Do not let the model fabricate tool results. If no tool result exists for a fresh fact, the app should ask for retrieval or return an uncertainty-aware answer.
- Use `session_id` with `python -m reframr serve` when you want conversation memory in the JSONL server.