Spaces:

CallMeDaniel
/

neuralcad

Sleeping

App Files Files Community

neuralcad / docs /wiki /gemini-agentic-patterns.md

CallMeDaniel

docs: add CNC/CAM knowledge wiki and slicer/preview design spec

265686c about 1 month ago

preview code

raw

history blame contribute delete

5.45 kB

	# Gemini Agentic Patterns

	## Models for NeuralCAD

	\| Model \| Strengths \| Context \| Cost (1M tokens in/out) \|
	\|-------\|-----------\|---------\|-------------------------\|
	\| gemini-2.5-flash \| Best price-performance, reasoning \| 1M \| Lowest tier \|
	\| gemini-2.5-pro \| Deep reasoning, proven \| 1M \| $1.25 / $10 \|
	\| gemini-3-flash-preview \| Frontier, balanced \| 1M \| $0.50 / $3 \|
	\| gemini-3.1-pro-preview \| Best reasoning, complex agentic \| 1M \| $4 / $18 \|

	NeuralCAD uses `gemini-2.5-flash` by default -- 5-10x cheaper than frontier models for multi-agent pipelines (~$0.05 per user turn with 8 API calls).

	## Function Calling

	### Declaration

	```python
	from google import genai
	from google.genai import types

	tools = types.Tool(function_declarations=[{
	"name": "generate_toolpath",
	"description": "Generate CNC toolpath from CadQuery shape",
	"parameters": {
	"type": "object",
	"properties": {
	"operations": {
	"type": "array",
	"items": {"type": "string",
	"enum": ["pocket", "profile", "drill", "adaptive"]}
	},
	"tool_diameter": {"type": "number"},
	"post_processor": {"type": "string",
	"enum": ["grbl", "linuxcnc", "fanuc"]}
	},
	"required": ["operations", "tool_diameter"]
	}
	}])
	```

	### Call-Response Loop

	```python
	config = types.GenerateContentConfig(tools=[tools])
	response = client.models.generate_content(
	model="gemini-2.5-flash", contents=prompt, config=config
	)

	for part in response.candidates[0].content.parts:
	if part.function_call:
	fn = part.function_call
	result = execute_function(fn.name, **fn.args)
	function_response = types.Part.from_function_response(
	name=fn.name, response={"result": result}, id=fn.id
	)
	# Send result back to continue conversation
	```

	### Modes

	\| Mode \| Behavior \|
	\|------\|----------\|
	\| AUTO \| Model decides whether to call a function or respond with text (default) \|
	\| ANY \| Model forced to always call a function \|
	\| VALIDATED \| Must call functions OR text, schema adherence enforced \|
	\| NONE \| Function calling disabled \|

	`allowed_function_names` restricts which functions the model can select per turn.

	### Parallel Calls

	Gemini can request multiple independent function calls in a single response. Each has a unique `id`. Execute all, return all results mapped by `id` in one response.

	### Limits

	- Recommend 10-20 active tools maximum per call
	- Temperature 1.0 recommended (lower values cause unexpected behavior)
	- Function descriptions count toward input token limits

	## Thinking / Reasoning

	### Gemini 2.5 (thinking_budget)

	```python
	config = types.GenerateContentConfig(
	thinking_config=types.ThinkingConfig(thinking_budget=8192)
	)
	```

	\| Model \| Range \| Default \|
	\|-------\|-------\|---------\|
	\| 2.5 Pro \| 128-32,768 tokens \| Dynamic (cannot disable) \|
	\| 2.5 Flash \| 0-24,576 tokens \| Dynamic; disable with 0 \|

	### Gemini 3.x (thinking_level)

	```python
	config = types.GenerateContentConfig(
	thinking_config=types.ThinkingConfig(thinking_level="high")
	)
	```

	Levels: minimal, low, medium, high. Default is "high" for 3 Flash and 3.1 Pro.

	Thinking tokens are billed as output tokens. Check `response.usage_metadata.thoughts_token_count`.

	Thinking "significantly improves function calling performance" -- better tool selection and parameter inference.

	## Code Execution

	Built-in sandbox runs Python with 43+ libraries (numpy, scipy, matplotlib, sympy, etc.). CadQuery is NOT available in the sandbox. Useful for mathematical calculations (feed/speed, chip load), not for CAD execution. Max 30 seconds per execution.

	## Google Agent Development Kit (ADK)

	```python
	from google.adk import LlmAgent, SequentialAgent, ParallelAgent, LoopAgent

	# Sequential pipeline
	pipeline = SequentialAgent(sub_agents=[
	LlmAgent(name="Designer", output_key="design_spec", ...),
	LlmAgent(name="Engineer", output_key="eng_spec", ...),
	LlmAgent(name="CADCoder", output_key="cad_code", tools=[generate_cadquery], ...),
	])

	# Self-correction loop
	loop = LoopAgent(
	sub_agents=[cad_generator, code_validator],
	condition_key="feedback", exit_condition="PASS", max_iterations=3
	)

	# Parallel validation
	reviews = ParallelAgent(sub_agents=[cnc_check, structural_check, cost_check])
	```

	ADK provides native Gemini integration, multi-language support (Python/TypeScript/Go/Java), and deployment to Vertex AI.

	## Agentic Architecture for CNC

	### Full Pipeline with Gemini

	```
	User Prompt
	-> Gemini (thinking=high) -> routes to agents via function calling
	-> Design/Engineering/CNC agents provide structured feedback
	-> CAD Coder generates CadQuery code (thinking_budget=8192)
	-> Execute CadQuery on server
	-> If error: feed back to Gemini -> retry (up to 3 times)
	-> If success: validate_for_cnc
	-> CAM Agent selects operations/tools
	-> generate_toolpath via ocp-freecad-cam
	-> Return G-code + 3D preview data
	```

	### Self-Correction Performance

	Gemini 2.0 Flash CadQuery generation: 53% success zero-shot, 85% with error feedback loop. NeuralCAD already implements this in `_execute_cad_code()`.

	### Cost per Turn (8 API calls, ~4K in + ~2K out each)

	\| Model \| Total per turn \|
	\|-------\|---------------\|
	\| Gemini 2.5 Flash \| $0.05 \|
	\| Gemini 2.5 Pro \| $0.26 \|
	\| Claude Sonnet 4.6 \| $0.34 \|
	\| GPT-5.4 \| $0.32 \|