Spaces:
Sleeping
Gemini Agentic Patterns
Models for NeuralCAD
| Model | Strengths | Context | Cost (1M tokens in/out) |
|---|---|---|---|
| gemini-2.5-flash | Best price-performance, reasoning | 1M | Lowest tier |
| gemini-2.5-pro | Deep reasoning, proven | 1M | $1.25 / $10 |
| gemini-3-flash-preview | Frontier, balanced | 1M | $0.50 / $3 |
| gemini-3.1-pro-preview | Best reasoning, complex agentic | 1M | $4 / $18 |
NeuralCAD uses gemini-2.5-flash by default -- 5-10x cheaper than frontier models for multi-agent pipelines (~$0.05 per user turn with 8 API calls).
Function Calling
Declaration
from google import genai
from google.genai import types
tools = types.Tool(function_declarations=[{
"name": "generate_toolpath",
"description": "Generate CNC toolpath from CadQuery shape",
"parameters": {
"type": "object",
"properties": {
"operations": {
"type": "array",
"items": {"type": "string",
"enum": ["pocket", "profile", "drill", "adaptive"]}
},
"tool_diameter": {"type": "number"},
"post_processor": {"type": "string",
"enum": ["grbl", "linuxcnc", "fanuc"]}
},
"required": ["operations", "tool_diameter"]
}
}])
Call-Response Loop
config = types.GenerateContentConfig(tools=[tools])
response = client.models.generate_content(
model="gemini-2.5-flash", contents=prompt, config=config
)
for part in response.candidates[0].content.parts:
if part.function_call:
fn = part.function_call
result = execute_function(fn.name, **fn.args)
function_response = types.Part.from_function_response(
name=fn.name, response={"result": result}, id=fn.id
)
# Send result back to continue conversation
Modes
| Mode | Behavior |
|---|---|
| AUTO | Model decides whether to call a function or respond with text (default) |
| ANY | Model forced to always call a function |
| VALIDATED | Must call functions OR text, schema adherence enforced |
| NONE | Function calling disabled |
allowed_function_names restricts which functions the model can select per turn.
Parallel Calls
Gemini can request multiple independent function calls in a single response. Each has a unique id. Execute all, return all results mapped by id in one response.
Limits
- Recommend 10-20 active tools maximum per call
- Temperature 1.0 recommended (lower values cause unexpected behavior)
- Function descriptions count toward input token limits
Thinking / Reasoning
Gemini 2.5 (thinking_budget)
config = types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(thinking_budget=8192)
)
| Model | Range | Default |
|---|---|---|
| 2.5 Pro | 128-32,768 tokens | Dynamic (cannot disable) |
| 2.5 Flash | 0-24,576 tokens | Dynamic; disable with 0 |
Gemini 3.x (thinking_level)
config = types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(thinking_level="high")
)
Levels: minimal, low, medium, high. Default is "high" for 3 Flash and 3.1 Pro.
Thinking tokens are billed as output tokens. Check response.usage_metadata.thoughts_token_count.
Thinking "significantly improves function calling performance" -- better tool selection and parameter inference.
Code Execution
Built-in sandbox runs Python with 43+ libraries (numpy, scipy, matplotlib, sympy, etc.). CadQuery is NOT available in the sandbox. Useful for mathematical calculations (feed/speed, chip load), not for CAD execution. Max 30 seconds per execution.
Google Agent Development Kit (ADK)
from google.adk import LlmAgent, SequentialAgent, ParallelAgent, LoopAgent
# Sequential pipeline
pipeline = SequentialAgent(sub_agents=[
LlmAgent(name="Designer", output_key="design_spec", ...),
LlmAgent(name="Engineer", output_key="eng_spec", ...),
LlmAgent(name="CADCoder", output_key="cad_code", tools=[generate_cadquery], ...),
])
# Self-correction loop
loop = LoopAgent(
sub_agents=[cad_generator, code_validator],
condition_key="feedback", exit_condition="PASS", max_iterations=3
)
# Parallel validation
reviews = ParallelAgent(sub_agents=[cnc_check, structural_check, cost_check])
ADK provides native Gemini integration, multi-language support (Python/TypeScript/Go/Java), and deployment to Vertex AI.
Agentic Architecture for CNC
Full Pipeline with Gemini
User Prompt
-> Gemini (thinking=high) -> routes to agents via function calling
-> Design/Engineering/CNC agents provide structured feedback
-> CAD Coder generates CadQuery code (thinking_budget=8192)
-> Execute CadQuery on server
-> If error: feed back to Gemini -> retry (up to 3 times)
-> If success: validate_for_cnc
-> CAM Agent selects operations/tools
-> generate_toolpath via ocp-freecad-cam
-> Return G-code + 3D preview data
Self-Correction Performance
Gemini 2.0 Flash CadQuery generation: 53% success zero-shot, 85% with error feedback loop. NeuralCAD already implements this in _execute_cad_code().
Cost per Turn (8 API calls, ~4K in + ~2K out each)
| Model | Total per turn |
|---|---|
| Gemini 2.5 Flash | $0.05 |
| Gemini 2.5 Pro | $0.26 |
| Claude Sonnet 4.6 | $0.34 |
| GPT-5.4 | $0.32 |