Spaces:
Sleeping
Sleeping
| # Gemini Agentic Patterns | |
| ## Models for NeuralCAD | |
| | Model | Strengths | Context | Cost (1M tokens in/out) | | |
| |-------|-----------|---------|-------------------------| | |
| | gemini-2.5-flash | Best price-performance, reasoning | 1M | Lowest tier | | |
| | gemini-2.5-pro | Deep reasoning, proven | 1M | $1.25 / $10 | | |
| | gemini-3-flash-preview | Frontier, balanced | 1M | $0.50 / $3 | | |
| | gemini-3.1-pro-preview | Best reasoning, complex agentic | 1M | $4 / $18 | | |
| NeuralCAD uses `gemini-2.5-flash` by default -- 5-10x cheaper than frontier models for multi-agent pipelines (~$0.05 per user turn with 8 API calls). | |
| ## Function Calling | |
| ### Declaration | |
| ```python | |
| from google import genai | |
| from google.genai import types | |
| tools = types.Tool(function_declarations=[{ | |
| "name": "generate_toolpath", | |
| "description": "Generate CNC toolpath from CadQuery shape", | |
| "parameters": { | |
| "type": "object", | |
| "properties": { | |
| "operations": { | |
| "type": "array", | |
| "items": {"type": "string", | |
| "enum": ["pocket", "profile", "drill", "adaptive"]} | |
| }, | |
| "tool_diameter": {"type": "number"}, | |
| "post_processor": {"type": "string", | |
| "enum": ["grbl", "linuxcnc", "fanuc"]} | |
| }, | |
| "required": ["operations", "tool_diameter"] | |
| } | |
| }]) | |
| ``` | |
| ### Call-Response Loop | |
| ```python | |
| config = types.GenerateContentConfig(tools=[tools]) | |
| response = client.models.generate_content( | |
| model="gemini-2.5-flash", contents=prompt, config=config | |
| ) | |
| for part in response.candidates[0].content.parts: | |
| if part.function_call: | |
| fn = part.function_call | |
| result = execute_function(fn.name, **fn.args) | |
| function_response = types.Part.from_function_response( | |
| name=fn.name, response={"result": result}, id=fn.id | |
| ) | |
| # Send result back to continue conversation | |
| ``` | |
| ### Modes | |
| | Mode | Behavior | | |
| |------|----------| | |
| | AUTO | Model decides whether to call a function or respond with text (default) | | |
| | ANY | Model forced to always call a function | | |
| | VALIDATED | Must call functions OR text, schema adherence enforced | | |
| | NONE | Function calling disabled | | |
| `allowed_function_names` restricts which functions the model can select per turn. | |
| ### Parallel Calls | |
| Gemini can request multiple independent function calls in a single response. Each has a unique `id`. Execute all, return all results mapped by `id` in one response. | |
| ### Limits | |
| - Recommend 10-20 active tools maximum per call | |
| - Temperature 1.0 recommended (lower values cause unexpected behavior) | |
| - Function descriptions count toward input token limits | |
| ## Thinking / Reasoning | |
| ### Gemini 2.5 (thinking_budget) | |
| ```python | |
| config = types.GenerateContentConfig( | |
| thinking_config=types.ThinkingConfig(thinking_budget=8192) | |
| ) | |
| ``` | |
| | Model | Range | Default | | |
| |-------|-------|---------| | |
| | 2.5 Pro | 128-32,768 tokens | Dynamic (cannot disable) | | |
| | 2.5 Flash | 0-24,576 tokens | Dynamic; disable with 0 | | |
| ### Gemini 3.x (thinking_level) | |
| ```python | |
| config = types.GenerateContentConfig( | |
| thinking_config=types.ThinkingConfig(thinking_level="high") | |
| ) | |
| ``` | |
| Levels: minimal, low, medium, high. Default is "high" for 3 Flash and 3.1 Pro. | |
| Thinking tokens are **billed as output tokens**. Check `response.usage_metadata.thoughts_token_count`. | |
| Thinking "significantly improves function calling performance" -- better tool selection and parameter inference. | |
| ## Code Execution | |
| Built-in sandbox runs Python with 43+ libraries (numpy, scipy, matplotlib, sympy, etc.). **CadQuery is NOT available** in the sandbox. Useful for mathematical calculations (feed/speed, chip load), not for CAD execution. Max 30 seconds per execution. | |
| ## Google Agent Development Kit (ADK) | |
| ```python | |
| from google.adk import LlmAgent, SequentialAgent, ParallelAgent, LoopAgent | |
| # Sequential pipeline | |
| pipeline = SequentialAgent(sub_agents=[ | |
| LlmAgent(name="Designer", output_key="design_spec", ...), | |
| LlmAgent(name="Engineer", output_key="eng_spec", ...), | |
| LlmAgent(name="CADCoder", output_key="cad_code", tools=[generate_cadquery], ...), | |
| ]) | |
| # Self-correction loop | |
| loop = LoopAgent( | |
| sub_agents=[cad_generator, code_validator], | |
| condition_key="feedback", exit_condition="PASS", max_iterations=3 | |
| ) | |
| # Parallel validation | |
| reviews = ParallelAgent(sub_agents=[cnc_check, structural_check, cost_check]) | |
| ``` | |
| ADK provides native Gemini integration, multi-language support (Python/TypeScript/Go/Java), and deployment to Vertex AI. | |
| ## Agentic Architecture for CNC | |
| ### Full Pipeline with Gemini | |
| ``` | |
| User Prompt | |
| -> Gemini (thinking=high) -> routes to agents via function calling | |
| -> Design/Engineering/CNC agents provide structured feedback | |
| -> CAD Coder generates CadQuery code (thinking_budget=8192) | |
| -> Execute CadQuery on server | |
| -> If error: feed back to Gemini -> retry (up to 3 times) | |
| -> If success: validate_for_cnc | |
| -> CAM Agent selects operations/tools | |
| -> generate_toolpath via ocp-freecad-cam | |
| -> Return G-code + 3D preview data | |
| ``` | |
| ### Self-Correction Performance | |
| Gemini 2.0 Flash CadQuery generation: 53% success zero-shot, **85% with error feedback loop**. NeuralCAD already implements this in `_execute_cad_code()`. | |
| ### Cost per Turn (8 API calls, ~4K in + ~2K out each) | |
| | Model | Total per turn | | |
| |-------|---------------| | |
| | Gemini 2.5 Flash | $0.05 | | |
| | Gemini 2.5 Pro | $0.26 | | |
| | Claude Sonnet 4.6 | $0.34 | | |
| | GPT-5.4 | $0.32 | | |