Spaces:
Sleeping
Sleeping
File size: 5,448 Bytes
265686c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 | # Gemini Agentic Patterns
## Models for NeuralCAD
| Model | Strengths | Context | Cost (1M tokens in/out) |
|-------|-----------|---------|-------------------------|
| gemini-2.5-flash | Best price-performance, reasoning | 1M | Lowest tier |
| gemini-2.5-pro | Deep reasoning, proven | 1M | $1.25 / $10 |
| gemini-3-flash-preview | Frontier, balanced | 1M | $0.50 / $3 |
| gemini-3.1-pro-preview | Best reasoning, complex agentic | 1M | $4 / $18 |
NeuralCAD uses `gemini-2.5-flash` by default -- 5-10x cheaper than frontier models for multi-agent pipelines (~$0.05 per user turn with 8 API calls).
## Function Calling
### Declaration
```python
from google import genai
from google.genai import types
tools = types.Tool(function_declarations=[{
"name": "generate_toolpath",
"description": "Generate CNC toolpath from CadQuery shape",
"parameters": {
"type": "object",
"properties": {
"operations": {
"type": "array",
"items": {"type": "string",
"enum": ["pocket", "profile", "drill", "adaptive"]}
},
"tool_diameter": {"type": "number"},
"post_processor": {"type": "string",
"enum": ["grbl", "linuxcnc", "fanuc"]}
},
"required": ["operations", "tool_diameter"]
}
}])
```
### Call-Response Loop
```python
config = types.GenerateContentConfig(tools=[tools])
response = client.models.generate_content(
model="gemini-2.5-flash", contents=prompt, config=config
)
for part in response.candidates[0].content.parts:
if part.function_call:
fn = part.function_call
result = execute_function(fn.name, **fn.args)
function_response = types.Part.from_function_response(
name=fn.name, response={"result": result}, id=fn.id
)
# Send result back to continue conversation
```
### Modes
| Mode | Behavior |
|------|----------|
| AUTO | Model decides whether to call a function or respond with text (default) |
| ANY | Model forced to always call a function |
| VALIDATED | Must call functions OR text, schema adherence enforced |
| NONE | Function calling disabled |
`allowed_function_names` restricts which functions the model can select per turn.
### Parallel Calls
Gemini can request multiple independent function calls in a single response. Each has a unique `id`. Execute all, return all results mapped by `id` in one response.
### Limits
- Recommend 10-20 active tools maximum per call
- Temperature 1.0 recommended (lower values cause unexpected behavior)
- Function descriptions count toward input token limits
## Thinking / Reasoning
### Gemini 2.5 (thinking_budget)
```python
config = types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(thinking_budget=8192)
)
```
| Model | Range | Default |
|-------|-------|---------|
| 2.5 Pro | 128-32,768 tokens | Dynamic (cannot disable) |
| 2.5 Flash | 0-24,576 tokens | Dynamic; disable with 0 |
### Gemini 3.x (thinking_level)
```python
config = types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(thinking_level="high")
)
```
Levels: minimal, low, medium, high. Default is "high" for 3 Flash and 3.1 Pro.
Thinking tokens are **billed as output tokens**. Check `response.usage_metadata.thoughts_token_count`.
Thinking "significantly improves function calling performance" -- better tool selection and parameter inference.
## Code Execution
Built-in sandbox runs Python with 43+ libraries (numpy, scipy, matplotlib, sympy, etc.). **CadQuery is NOT available** in the sandbox. Useful for mathematical calculations (feed/speed, chip load), not for CAD execution. Max 30 seconds per execution.
## Google Agent Development Kit (ADK)
```python
from google.adk import LlmAgent, SequentialAgent, ParallelAgent, LoopAgent
# Sequential pipeline
pipeline = SequentialAgent(sub_agents=[
LlmAgent(name="Designer", output_key="design_spec", ...),
LlmAgent(name="Engineer", output_key="eng_spec", ...),
LlmAgent(name="CADCoder", output_key="cad_code", tools=[generate_cadquery], ...),
])
# Self-correction loop
loop = LoopAgent(
sub_agents=[cad_generator, code_validator],
condition_key="feedback", exit_condition="PASS", max_iterations=3
)
# Parallel validation
reviews = ParallelAgent(sub_agents=[cnc_check, structural_check, cost_check])
```
ADK provides native Gemini integration, multi-language support (Python/TypeScript/Go/Java), and deployment to Vertex AI.
## Agentic Architecture for CNC
### Full Pipeline with Gemini
```
User Prompt
-> Gemini (thinking=high) -> routes to agents via function calling
-> Design/Engineering/CNC agents provide structured feedback
-> CAD Coder generates CadQuery code (thinking_budget=8192)
-> Execute CadQuery on server
-> If error: feed back to Gemini -> retry (up to 3 times)
-> If success: validate_for_cnc
-> CAM Agent selects operations/tools
-> generate_toolpath via ocp-freecad-cam
-> Return G-code + 3D preview data
```
### Self-Correction Performance
Gemini 2.0 Flash CadQuery generation: 53% success zero-shot, **85% with error feedback loop**. NeuralCAD already implements this in `_execute_cad_code()`.
### Cost per Turn (8 API calls, ~4K in + ~2K out each)
| Model | Total per turn |
|-------|---------------|
| Gemini 2.5 Flash | $0.05 |
| Gemini 2.5 Pro | $0.26 |
| Claude Sonnet 4.6 | $0.34 |
| GPT-5.4 | $0.32 |
|