File size: 2,931 Bytes
5348cd5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
# Reframr OpenAI-Compatible Runtime

Reframr v3 runtime work includes an OpenAI-style adapter so apps can plug Reframr into existing chat, support, and tool orchestration systems without writing custom prompt glue.

## Chat Completion

```python
from pathlib import Path

from reframr import ReframrModel, build_chat_completion_response

model = ReframrModel.load(Path("model.safetensors"))

response = build_chat_completion_response(
    model,
    {
        "model": "reframr-v3",
        "messages": [
            {"role": "system", "content": "Be concise and cite sources when tool results are provided."},
            {"role": "user", "content": "Summarize this customer support issue."},
        ],
        "max_tokens": 160,
        "temperature": 0.58,
    },
)

print(response["choices"][0]["message"]["content"])
```

## Streaming

```python
from reframr.openai_compat import iter_sse_chat_completion

for event in iter_sse_chat_completion(model, request):
    send_to_browser(event)
```

The stream emits OpenAI-style `chat.completion.chunk` SSE events and ends with:

```text
data: [DONE]
```

## Tool Loop

Register real tools in the host application. Reframr can request a tool with `<tool_call>`, the host executes the function, and the result is fed back as `<tool_result>` / `<source>` evidence.

```python
from reframr.openai_compat import run_tool_loop

def web_search(arguments: dict[str, object]) -> dict[str, object]:
    query = str(arguments["query"])
    result = your_search_client.search(query)
    return {
        "ok": True,
        "source": {
            "title": result.title,
            "url": result.url,
            "snippet": result.snippet,
        },
    }

response = run_tool_loop(
    model,
    {
        "model": "reframr-v3",
        "messages": [
            {"role": "user", "content": "What changed in the latest official release notes?"}
        ],
    },
    tools={"web.search": web_search},
    max_rounds=3,
)
```

If a tool is missing or fails, the adapter sends the failure back as a tool result instead of crashing. That lets Reframr answer honestly, retry with a different tool if the model requests one, or ask the user for source evidence.

## CLI

```bash
python -m reframr chat-completion --model model.safetensors < request.json
```

For SSE output:

```json
{
  "model": "reframr-v3",
  "stream": true,
  "messages": [
    {"role": "user", "content": "Write a short support reply."}
  ]
}
```

## Deployment Notes

- Keep real tools outside the model runtime and pass their outputs back as data.
- Treat source quality as part of the product: validate URLs, timestamps, permissions, and user access.
- Do not let the model fabricate tool results. If no tool result exists for a fresh fact, the app should ask for retrieval or return an uncertainty-aware answer.
- Use `session_id` with `python -m reframr serve` when you want conversation memory in the JSONL server.