Spaces:

CallMeDaniel
/

neuralcad

Sleeping

App Files Files Community

neuralcad / docs /wiki /gemini-agentic-patterns.md

CallMeDaniel

docs: add CNC/CAM knowledge wiki and slicer/preview design spec

265686c about 1 month ago

preview code

raw

history blame contribute delete

5.45 kB

Gemini Agentic Patterns

Models for NeuralCAD

Model	Strengths	Context	Cost (1M tokens in/out)
gemini-2.5-flash	Best price-performance, reasoning	1M	Lowest tier
gemini-2.5-pro	Deep reasoning, proven	1M	$1.25 / $10
gemini-3-flash-preview	Frontier, balanced	1M	$0.50 / $3
gemini-3.1-pro-preview	Best reasoning, complex agentic	1M	$4 / $18

NeuralCAD uses gemini-2.5-flash by default -- 5-10x cheaper than frontier models for multi-agent pipelines (~$0.05 per user turn with 8 API calls).

Function Calling

Declaration

from google import genai
from google.genai import types

tools = types.Tool(function_declarations=[{
    "name": "generate_toolpath",
    "description": "Generate CNC toolpath from CadQuery shape",
    "parameters": {
        "type": "object",
        "properties": {
            "operations": {
                "type": "array",
                "items": {"type": "string",
                          "enum": ["pocket", "profile", "drill", "adaptive"]}
            },
            "tool_diameter": {"type": "number"},
            "post_processor": {"type": "string",
                               "enum": ["grbl", "linuxcnc", "fanuc"]}
        },
        "required": ["operations", "tool_diameter"]
    }
}])

Call-Response Loop

config = types.GenerateContentConfig(tools=[tools])
response = client.models.generate_content(
    model="gemini-2.5-flash", contents=prompt, config=config
)

for part in response.candidates[0].content.parts:
    if part.function_call:
        fn = part.function_call
        result = execute_function(fn.name, **fn.args)
        function_response = types.Part.from_function_response(
            name=fn.name, response={"result": result}, id=fn.id
        )
        # Send result back to continue conversation

Modes

Mode	Behavior
AUTO	Model decides whether to call a function or respond with text (default)
ANY	Model forced to always call a function
VALIDATED	Must call functions OR text, schema adherence enforced
NONE	Function calling disabled

allowed_function_names restricts which functions the model can select per turn.

Parallel Calls

Gemini can request multiple independent function calls in a single response. Each has a unique id. Execute all, return all results mapped by id in one response.

Limits

Recommend 10-20 active tools maximum per call
Temperature 1.0 recommended (lower values cause unexpected behavior)
Function descriptions count toward input token limits

Thinking / Reasoning

Gemini 2.5 (thinking_budget)

config = types.GenerateContentConfig(
    thinking_config=types.ThinkingConfig(thinking_budget=8192)
)

Model	Range	Default
2.5 Pro	128-32,768 tokens	Dynamic (cannot disable)
2.5 Flash	0-24,576 tokens	Dynamic; disable with 0

Gemini 3.x (thinking_level)

config = types.GenerateContentConfig(
    thinking_config=types.ThinkingConfig(thinking_level="high")
)

Levels: minimal, low, medium, high. Default is "high" for 3 Flash and 3.1 Pro.

Thinking tokens are billed as output tokens. Check response.usage_metadata.thoughts_token_count.

Thinking "significantly improves function calling performance" -- better tool selection and parameter inference.

Code Execution

Built-in sandbox runs Python with 43+ libraries (numpy, scipy, matplotlib, sympy, etc.). CadQuery is NOT available in the sandbox. Useful for mathematical calculations (feed/speed, chip load), not for CAD execution. Max 30 seconds per execution.

Google Agent Development Kit (ADK)

from google.adk import LlmAgent, SequentialAgent, ParallelAgent, LoopAgent

# Sequential pipeline
pipeline = SequentialAgent(sub_agents=[
    LlmAgent(name="Designer", output_key="design_spec", ...),
    LlmAgent(name="Engineer", output_key="eng_spec", ...),
    LlmAgent(name="CADCoder", output_key="cad_code", tools=[generate_cadquery], ...),
])

# Self-correction loop
loop = LoopAgent(
    sub_agents=[cad_generator, code_validator],
    condition_key="feedback", exit_condition="PASS", max_iterations=3
)

# Parallel validation
reviews = ParallelAgent(sub_agents=[cnc_check, structural_check, cost_check])

ADK provides native Gemini integration, multi-language support (Python/TypeScript/Go/Java), and deployment to Vertex AI.

Agentic Architecture for CNC

Full Pipeline with Gemini

User Prompt
  -> Gemini (thinking=high) -> routes to agents via function calling
  -> Design/Engineering/CNC agents provide structured feedback
  -> CAD Coder generates CadQuery code (thinking_budget=8192)
  -> Execute CadQuery on server
  -> If error: feed back to Gemini -> retry (up to 3 times)
  -> If success: validate_for_cnc
  -> CAM Agent selects operations/tools
  -> generate_toolpath via ocp-freecad-cam
  -> Return G-code + 3D preview data

Self-Correction Performance

Gemini 2.0 Flash CadQuery generation: 53% success zero-shot, 85% with error feedback loop. NeuralCAD already implements this in _execute_cad_code().

Cost per Turn (8 API calls, ~4K in + ~2K out each)

Model	Total per turn
Gemini 2.5 Flash	$0.05
Gemini 2.5 Pro	$0.26
Claude Sonnet 4.6	$0.34
GPT-5.4	$0.32