File size: 3,105 Bytes
1766992
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
# API Capabilities

## Supported

- OpenAI-compatible `POST /v1/chat/completions`
- OpenAI-compatible `POST /v1/responses`
- OpenAI-compatible `GET /v1/models`
- Non-stream responses with:
  - plain assistant text
  - assistant `tool_calls`
- Responses API outputs with:
  - `output_text`
  - `function_call`, `apply_patch_call`, `shell_call`, `local_shell_call`
- Stream responses with:
  - `delta.content`
  - `delta.tool_calls`
- Responses API stream events:
  - `response.output_text.delta`
  - `response.function_call_arguments.delta`
- Multi-turn context via the `messages` array
- Tool request fields:
  - `tools`
  - `tool_choice`
  - assistant history `tool_calls`
  - tool role history `tool_call_id`
- Automatically derived `*-thinking` public models, for example:
  - `claude-sonnet-4.6`
  - `claude-sonnet-4.6-thinking`
- Bearer token auth via `Authorization: Bearer <API_KEY>`

## Behavior Notes

- Tool support is implemented through an internal prompt-and-parser bridge, not Cursor-native tool calling.
- Base models keep the current model name and enable tool use.
- `*-thinking` models map back to the same upstream base model, but also enable the internal thinking protocol.
- Thinking is an internal bridge capability only. The public OpenAI response does not expose a separate reasoning field.
- `previous_response_id` is supported with in-memory state only; restart clears history. For stateless use, include prior output items in `input`.

## Not Supported

- Anthropic `/v1/messages`
- MCP orchestration
- Native upstream OpenAI tool execution
- Exposed reasoning/thinking response fields
- Local filesystem or OS command execution through the API
- Persisted Responses API storage or server-side conversations across restarts

## Example: Non-Stream Tool Call

```bash
curl -X POST http://127.0.0.1:8002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer 0000" \
  -d '{
    "model": "claude-sonnet-4.6",
    "stream": false,
    "messages": [
      {"role": "user", "content": "帮我查询北京天气"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather",
          "parameters": {
            "type": "object",
            "properties": {
              "city": {"type": "string"}
            },
            "required": ["city"]
          }
        }
      }
    ]
  }'
```

## Example: Thinking Model

```bash
curl -X POST http://127.0.0.1:8002/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer 0000" \
  -d '{
    "model": "claude-sonnet-4.6-thinking",
    "stream": true,
    "messages": [
      {"role": "user", "content": "先思考,再决定是否需要工具"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "lookup",
          "parameters": {
            "type": "object",
            "properties": {
              "q": {"type": "string"}
            },
            "required": ["q"]
          }
        }
      }
    ]
  }'
```