File size: 5,448 Bytes
265686c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
# Gemini Agentic Patterns

## Models for NeuralCAD

| Model | Strengths | Context | Cost (1M tokens in/out) |
|-------|-----------|---------|-------------------------|
| gemini-2.5-flash | Best price-performance, reasoning | 1M | Lowest tier |
| gemini-2.5-pro | Deep reasoning, proven | 1M | $1.25 / $10 |
| gemini-3-flash-preview | Frontier, balanced | 1M | $0.50 / $3 |
| gemini-3.1-pro-preview | Best reasoning, complex agentic | 1M | $4 / $18 |

NeuralCAD uses `gemini-2.5-flash` by default -- 5-10x cheaper than frontier models for multi-agent pipelines (~$0.05 per user turn with 8 API calls).

## Function Calling

### Declaration

```python
from google import genai
from google.genai import types

tools = types.Tool(function_declarations=[{
    "name": "generate_toolpath",
    "description": "Generate CNC toolpath from CadQuery shape",
    "parameters": {
        "type": "object",
        "properties": {
            "operations": {
                "type": "array",
                "items": {"type": "string",
                          "enum": ["pocket", "profile", "drill", "adaptive"]}
            },
            "tool_diameter": {"type": "number"},
            "post_processor": {"type": "string",
                               "enum": ["grbl", "linuxcnc", "fanuc"]}
        },
        "required": ["operations", "tool_diameter"]
    }
}])
```

### Call-Response Loop

```python
config = types.GenerateContentConfig(tools=[tools])
response = client.models.generate_content(
    model="gemini-2.5-flash", contents=prompt, config=config
)

for part in response.candidates[0].content.parts:
    if part.function_call:
        fn = part.function_call
        result = execute_function(fn.name, **fn.args)
        function_response = types.Part.from_function_response(
            name=fn.name, response={"result": result}, id=fn.id
        )
        # Send result back to continue conversation
```

### Modes

| Mode | Behavior |
|------|----------|
| AUTO | Model decides whether to call a function or respond with text (default) |
| ANY | Model forced to always call a function |
| VALIDATED | Must call functions OR text, schema adherence enforced |
| NONE | Function calling disabled |

`allowed_function_names` restricts which functions the model can select per turn.

### Parallel Calls

Gemini can request multiple independent function calls in a single response. Each has a unique `id`. Execute all, return all results mapped by `id` in one response.

### Limits

- Recommend 10-20 active tools maximum per call
- Temperature 1.0 recommended (lower values cause unexpected behavior)
- Function descriptions count toward input token limits

## Thinking / Reasoning

### Gemini 2.5 (thinking_budget)

```python
config = types.GenerateContentConfig(
    thinking_config=types.ThinkingConfig(thinking_budget=8192)
)
```

| Model | Range | Default |
|-------|-------|---------|
| 2.5 Pro | 128-32,768 tokens | Dynamic (cannot disable) |
| 2.5 Flash | 0-24,576 tokens | Dynamic; disable with 0 |

### Gemini 3.x (thinking_level)

```python
config = types.GenerateContentConfig(
    thinking_config=types.ThinkingConfig(thinking_level="high")
)
```

Levels: minimal, low, medium, high. Default is "high" for 3 Flash and 3.1 Pro.

Thinking tokens are **billed as output tokens**. Check `response.usage_metadata.thoughts_token_count`.

Thinking "significantly improves function calling performance" -- better tool selection and parameter inference.

## Code Execution

Built-in sandbox runs Python with 43+ libraries (numpy, scipy, matplotlib, sympy, etc.). **CadQuery is NOT available** in the sandbox. Useful for mathematical calculations (feed/speed, chip load), not for CAD execution. Max 30 seconds per execution.

## Google Agent Development Kit (ADK)

```python
from google.adk import LlmAgent, SequentialAgent, ParallelAgent, LoopAgent

# Sequential pipeline
pipeline = SequentialAgent(sub_agents=[
    LlmAgent(name="Designer", output_key="design_spec", ...),
    LlmAgent(name="Engineer", output_key="eng_spec", ...),
    LlmAgent(name="CADCoder", output_key="cad_code", tools=[generate_cadquery], ...),
])

# Self-correction loop
loop = LoopAgent(
    sub_agents=[cad_generator, code_validator],
    condition_key="feedback", exit_condition="PASS", max_iterations=3
)

# Parallel validation
reviews = ParallelAgent(sub_agents=[cnc_check, structural_check, cost_check])
```

ADK provides native Gemini integration, multi-language support (Python/TypeScript/Go/Java), and deployment to Vertex AI.

## Agentic Architecture for CNC

### Full Pipeline with Gemini

```
User Prompt
  -> Gemini (thinking=high) -> routes to agents via function calling
  -> Design/Engineering/CNC agents provide structured feedback
  -> CAD Coder generates CadQuery code (thinking_budget=8192)
  -> Execute CadQuery on server
  -> If error: feed back to Gemini -> retry (up to 3 times)
  -> If success: validate_for_cnc
  -> CAM Agent selects operations/tools
  -> generate_toolpath via ocp-freecad-cam
  -> Return G-code + 3D preview data
```

### Self-Correction Performance

Gemini 2.0 Flash CadQuery generation: 53% success zero-shot, **85% with error feedback loop**. NeuralCAD already implements this in `_execute_cad_code()`.

### Cost per Turn (8 API calls, ~4K in + ~2K out each)

| Model | Total per turn |
|-------|---------------|
| Gemini 2.5 Flash | $0.05 |
| Gemini 2.5 Pro | $0.26 |
| Claude Sonnet 4.6 | $0.34 |
| GPT-5.4 | $0.32 |