Add files using upload-large-folder tool
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- text_content/docs_api-key_24760cd6.txt +5 -0
- text_content/docs_api-key_33a6f0bf.txt +5 -0
- text_content/docs_api-key_415bf51e.txt +5 -0
- text_content/docs_audio_b006d05e.txt +5 -0
- text_content/docs_audio_ce171fe5.txt +5 -0
- text_content/docs_available-regions_eddec083.txt +5 -0
- text_content/docs_batch-mode_18ed6717.txt +5 -0
- text_content/docs_batch-mode_dd3e59d0.txt +5 -0
- text_content/docs_billing_c19877aa.txt +5 -0
- text_content/docs_caching_eeebb99e.txt +5 -0
- text_content/docs_code-execution_61a8afb4.txt +5 -0
- text_content/docs_downloads_a1919083.txt +5 -0
- text_content/docs_embeddings_73f381c0.txt +5 -0
- text_content/docs_ephemeral-tokens_d7b3db56.txt +5 -0
- text_content/docs_files_2dd3ac4c.txt +5 -0
- text_content/docs_files_7ee302e6.txt +5 -0
- text_content/docs_function-calling_5268a3c8.txt +5 -0
- text_content/docs_function-calling_84826ed5.txt +5 -0
- text_content/docs_function-calling_8cc46b89.txt +5 -0
- text_content/docs_function-calling_dbd56c28.txt +5 -0
- text_content/docs_function-calling_f7913038.txt +5 -0
- text_content/docs_google-search_5a931e9e.txt +5 -0
- text_content/docs_google-search_c7ceaf0e.txt +5 -0
- text_content/docs_grounding_06db198b.txt +5 -0
- text_content/docs_image-generation_15f086c8.txt +5 -0
- text_content/docs_image-generation_55547714.txt +5 -0
- text_content/docs_image-generation_56ff9b7f.txt +5 -0
- text_content/docs_image-generation_74d7b467.txt +5 -0
- text_content/docs_image-understanding_1996580a.txt +5 -0
- text_content/docs_image-understanding_26d88c6b.txt +5 -0
- text_content/docs_live-guide_df92dcd6.txt +5 -0
- text_content/docs_live-tools_141b2194.txt +5 -0
- text_content/docs_live_1db6d544.txt +5 -0
- text_content/docs_live_9c2b1926.txt +5 -0
- text_content/docs_live_e7a4e219.txt +5 -0
- text_content/docs_migrate_8812abe1.txt +5 -0
- text_content/docs_model-tuning_dfd6a211.txt +5 -0
- text_content/docs_models_08e6e52a.txt +5 -0
- text_content/docs_models_1ac37306.txt +5 -0
- text_content/docs_models_214b6f90.txt +5 -0
- text_content/docs_models_4b62ed0f.txt +5 -0
- text_content/docs_models_8d3d236a.txt +5 -0
- text_content/docs_models_b7a54a26.txt +5 -0
- text_content/docs_models_d8a3b996.txt +5 -0
- text_content/docs_openai_81698f40.txt +5 -0
- text_content/docs_pricing_a0fd1cc1.txt +5 -0
- text_content/docs_pricing_b1f23843.txt +5 -0
- text_content/docs_pricing_cda76b07.txt +5 -0
- text_content/docs_prompting-strategies_313b4935.txt +5 -0
- text_content/docs_prompting-strategies_44c9eff4.txt +5 -0
text_content/docs_api-key_24760cd6.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/api-key#main-content
|
| 2 |
+
Title: Using Gemini API keys | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Using Gemini API keys | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Using Gemini API keys To use the Gemini API, you need an API key. You can create a key for free with a few clicks in Google AI Studio . Once you have an API key, you have the following options to connect to the Gemini API: Setting your API key as an environment variable Providing your API key explicitly For initial testing, you can hard code an API key, but this should only be temporary since it's not secure. You can find examples for hard coding the API key in Providing API key explicitly section. Setting API key as environment variable If you set the environment variable GEMINI_API_KEY or GOOGLE_API_KEY , the API key will automatically be picked up by the client when using one of the Gemini API libraries . It's recommended that you set only one of those variables, but if both are set, GOOGLE_API_KEY takes precedence. If you're using the REST API, or JavaScript on the browser, you will need to provide the API key explicitly. Here is how you can set your API key locally as the environment variable GEMINI_API_KEY with different operating systems. Linux/macOS - Bash Bash is a common Linux and macOS terminal configuration. You can check if you have a configuration file for it by running the following command: ~/.bashrc If the response is "No such file or directory", you will need to create this file and open it by running the following commands, or use zsh : touch ~/.bashrc open ~/.bashrc Next, you need to set your API key by adding the following export command: export GEMINI_API_KEY = <YOUR_API_KEY_HERE> After saving the file, apply the changes by running: source ~/.bashrc macOS - Zsh Zsh is a common Linux and macOS terminal configuration. You can check if you have a configuration file for it by running the following command: ~/.zshrc If the response is "No such file or directory", you will need to create this file and open it by running the following commands, or use bash : touch ~/.zshrc open ~/.zshrc Next, you need to set your API key by adding the following export command: export GEMINI_API_KEY = <YOUR_API_KEY_HERE> After saving the file, apply the changes by running: source ~/.zshrc Windows Search for "Environment Variables" in the system settings Edit either "User variables" (for current user) or "System variables" (for all users - use with caution). Create the variable and add export GEMINI_API_KEY=your_key_here Apply the changes Providing API key explicitly In some cases, you may want to explicitly provide an API key. For example: You're doing a simple API call and prefer hard coding the API key. You want explicit control without having to rely on automatic discovery of environment variables by the Gemini API libraries You're using an environment where environment variables are not supported (e.g web) or you are making REST calls. Below are examples for how you can provide an API key explicitly: Python from google import genai client = genai . Client ( api_key = " YOUR_API_KEY " ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Explain how AI works in a few words" ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({ apiKey : " YOUR_API_KEY " }); async function main () { const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : "Explain how AI works in a few words" , }); console . log ( response . text ); } main (); Go package main import ( "context" "fmt" "log" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , & genai . ClientConfig { APIKey : " YOUR_API_KEY " , Backend : genai . BackendGeminiAPI , }) if err != nil { log . Fatal ( err ) } result , err := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , genai . Text ( "Explain how AI works in a few words" ), nil , ) if err != nil { log . Fatal ( err ) } fmt . Println ( result . Text ()) } Java package com.example ; import com.google.genai.Client ; import com.google.genai.types.GenerateContentResponse ; public class GenerateTextFromTextInput { public static void main ( String [] args ) { Client client = Client . builder (). apiKey ( " YOUR_API_KEY " ). build (); GenerateContentResponse response = client . models . generateContent ( "gemini-2.5-flash" , "Explain how AI works in a few words" , null ); System . out . println ( response . text ()); } } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key= $ YOUR_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [ { "parts": [ { "text": "Explain how AI works in a few words" } ] } ] }' Keep your API key secure Treat your Gemini API key like a password. If compromised, others can use your project's quota, incur charges (if billing is enabled), and access your private data, such as files. Critical security rules Never commit API keys to source control. Do not check your API key into version control systems like Git. Never expose API keys on the client-side. Do not use your API key directly in web or mobile apps in production. Keys in client-side code (including our JavaScript/TypeScript libraries and REST calls) can be extracted. Best practices Use server-side calls with API keys The most secure way to use your API key is to call the Gemini API from a server-side application where the key can be kept confidential. Use ephemeral tokens for client-side access (Live API only): For direct client-side access to the Live API, you can use ephemeral tokens. They come with lower security risks and can be suitable for production use. Review ephemeral tokens guide for more information. Consider adding restrictions to your key: You can limit a key's permissions by adding API key restrictions . This minimizes the potential damage if the key is ever leaked. For some general best practices, you can also review this support article . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
|
text_content/docs_api-key_33a6f0bf.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/api-key
|
| 2 |
+
Title: Using Gemini API keys | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Using Gemini API keys | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Using Gemini API keys To use the Gemini API, you need an API key. You can create a key for free with a few clicks in Google AI Studio . Once you have an API key, you have the following options to connect to the Gemini API: Setting your API key as an environment variable Providing your API key explicitly For initial testing, you can hard code an API key, but this should only be temporary since it's not secure. You can find examples for hard coding the API key in Providing API key explicitly section. Setting API key as environment variable If you set the environment variable GEMINI_API_KEY or GOOGLE_API_KEY , the API key will automatically be picked up by the client when using one of the Gemini API libraries . It's recommended that you set only one of those variables, but if both are set, GOOGLE_API_KEY takes precedence. If you're using the REST API, or JavaScript on the browser, you will need to provide the API key explicitly. Here is how you can set your API key locally as the environment variable GEMINI_API_KEY with different operating systems. Linux/macOS - Bash Bash is a common Linux and macOS terminal configuration. You can check if you have a configuration file for it by running the following command: ~/.bashrc If the response is "No such file or directory", you will need to create this file and open it by running the following commands, or use zsh : touch ~/.bashrc open ~/.bashrc Next, you need to set your API key by adding the following export command: export GEMINI_API_KEY = <YOUR_API_KEY_HERE> After saving the file, apply the changes by running: source ~/.bashrc macOS - Zsh Zsh is a common Linux and macOS terminal configuration. You can check if you have a configuration file for it by running the following command: ~/.zshrc If the response is "No such file or directory", you will need to create this file and open it by running the following commands, or use bash : touch ~/.zshrc open ~/.zshrc Next, you need to set your API key by adding the following export command: export GEMINI_API_KEY = <YOUR_API_KEY_HERE> After saving the file, apply the changes by running: source ~/.zshrc Windows Search for "Environment Variables" in the system settings Edit either "User variables" (for current user) or "System variables" (for all users - use with caution). Create the variable and add export GEMINI_API_KEY=your_key_here Apply the changes Providing API key explicitly In some cases, you may want to explicitly provide an API key. For example: You're doing a simple API call and prefer hard coding the API key. You want explicit control without having to rely on automatic discovery of environment variables by the Gemini API libraries You're using an environment where environment variables are not supported (e.g web) or you are making REST calls. Below are examples for how you can provide an API key explicitly: Python from google import genai client = genai . Client ( api_key = " YOUR_API_KEY " ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Explain how AI works in a few words" ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({ apiKey : " YOUR_API_KEY " }); async function main () { const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : "Explain how AI works in a few words" , }); console . log ( response . text ); } main (); Go package main import ( "context" "fmt" "log" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , & genai . ClientConfig { APIKey : " YOUR_API_KEY " , Backend : genai . BackendGeminiAPI , }) if err != nil { log . Fatal ( err ) } result , err := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , genai . Text ( "Explain how AI works in a few words" ), nil , ) if err != nil { log . Fatal ( err ) } fmt . Println ( result . Text ()) } Java package com.example ; import com.google.genai.Client ; import com.google.genai.types.GenerateContentResponse ; public class GenerateTextFromTextInput { public static void main ( String [] args ) { Client client = Client . builder (). apiKey ( " YOUR_API_KEY " ). build (); GenerateContentResponse response = client . models . generateContent ( "gemini-2.5-flash" , "Explain how AI works in a few words" , null ); System . out . println ( response . text ()); } } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key= $ YOUR_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [ { "parts": [ { "text": "Explain how AI works in a few words" } ] } ] }' Keep your API key secure Treat your Gemini API key like a password. If compromised, others can use your project's quota, incur charges (if billing is enabled), and access your private data, such as files. Critical security rules Never commit API keys to source control. Do not check your API key into version control systems like Git. Never expose API keys on the client-side. Do not use your API key directly in web or mobile apps in production. Keys in client-side code (including our JavaScript/TypeScript libraries and REST calls) can be extracted. Best practices Use server-side calls with API keys The most secure way to use your API key is to call the Gemini API from a server-side application where the key can be kept confidential. Use ephemeral tokens for client-side access (Live API only): For direct client-side access to the Live API, you can use ephemeral tokens. They come with lower security risks and can be suitable for production use. Review ephemeral tokens guide for more information. Consider adding restrictions to your key: You can limit a key's permissions by adding API key restrictions . This minimizes the potential damage if the key is ever leaked. For some general best practices, you can also review this support article . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
|
text_content/docs_api-key_415bf51e.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/api-key#set-api-env-var
|
| 2 |
+
Title: Using Gemini API keys | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Using Gemini API keys | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Using Gemini API keys To use the Gemini API, you need an API key. You can create a key for free with a few clicks in Google AI Studio . Once you have an API key, you have the following options to connect to the Gemini API: Setting your API key as an environment variable Providing your API key explicitly For initial testing, you can hard code an API key, but this should only be temporary since it's not secure. You can find examples for hard coding the API key in Providing API key explicitly section. Setting API key as environment variable If you set the environment variable GEMINI_API_KEY or GOOGLE_API_KEY , the API key will automatically be picked up by the client when using one of the Gemini API libraries . It's recommended that you set only one of those variables, but if both are set, GOOGLE_API_KEY takes precedence. If you're using the REST API, or JavaScript on the browser, you will need to provide the API key explicitly. Here is how you can set your API key locally as the environment variable GEMINI_API_KEY with different operating systems. Linux/macOS - Bash Bash is a common Linux and macOS terminal configuration. You can check if you have a configuration file for it by running the following command: ~/.bashrc If the response is "No such file or directory", you will need to create this file and open it by running the following commands, or use zsh : touch ~/.bashrc open ~/.bashrc Next, you need to set your API key by adding the following export command: export GEMINI_API_KEY = <YOUR_API_KEY_HERE> After saving the file, apply the changes by running: source ~/.bashrc macOS - Zsh Zsh is a common Linux and macOS terminal configuration. You can check if you have a configuration file for it by running the following command: ~/.zshrc If the response is "No such file or directory", you will need to create this file and open it by running the following commands, or use bash : touch ~/.zshrc open ~/.zshrc Next, you need to set your API key by adding the following export command: export GEMINI_API_KEY = <YOUR_API_KEY_HERE> After saving the file, apply the changes by running: source ~/.zshrc Windows Search for "Environment Variables" in the system settings Edit either "User variables" (for current user) or "System variables" (for all users - use with caution). Create the variable and add export GEMINI_API_KEY=your_key_here Apply the changes Providing API key explicitly In some cases, you may want to explicitly provide an API key. For example: You're doing a simple API call and prefer hard coding the API key. You want explicit control without having to rely on automatic discovery of environment variables by the Gemini API libraries You're using an environment where environment variables are not supported (e.g web) or you are making REST calls. Below are examples for how you can provide an API key explicitly: Python from google import genai client = genai . Client ( api_key = " YOUR_API_KEY " ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Explain how AI works in a few words" ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({ apiKey : " YOUR_API_KEY " }); async function main () { const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : "Explain how AI works in a few words" , }); console . log ( response . text ); } main (); Go package main import ( "context" "fmt" "log" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , & genai . ClientConfig { APIKey : " YOUR_API_KEY " , Backend : genai . BackendGeminiAPI , }) if err != nil { log . Fatal ( err ) } result , err := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , genai . Text ( "Explain how AI works in a few words" ), nil , ) if err != nil { log . Fatal ( err ) } fmt . Println ( result . Text ()) } Java package com.example ; import com.google.genai.Client ; import com.google.genai.types.GenerateContentResponse ; public class GenerateTextFromTextInput { public static void main ( String [] args ) { Client client = Client . builder (). apiKey ( " YOUR_API_KEY " ). build (); GenerateContentResponse response = client . models . generateContent ( "gemini-2.5-flash" , "Explain how AI works in a few words" , null ); System . out . println ( response . text ()); } } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key= $ YOUR_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [ { "parts": [ { "text": "Explain how AI works in a few words" } ] } ] }' Keep your API key secure Treat your Gemini API key like a password. If compromised, others can use your project's quota, incur charges (if billing is enabled), and access your private data, such as files. Critical security rules Never commit API keys to source control. Do not check your API key into version control systems like Git. Never expose API keys on the client-side. Do not use your API key directly in web or mobile apps in production. Keys in client-side code (including our JavaScript/TypeScript libraries and REST calls) can be extracted. Best practices Use server-side calls with API keys The most secure way to use your API key is to call the Gemini API from a server-side application where the key can be kept confidential. Use ephemeral tokens for client-side access (Live API only): For direct client-side access to the Live API, you can use ephemeral tokens. They come with lower security risks and can be suitable for production use. Review ephemeral tokens guide for more information. Consider adding restrictions to your key: You can limit a key's permissions by adding API key restrictions . This minimizes the potential damage if the key is ever leaked. For some general best practices, you can also review this support article . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
|
text_content/docs_audio_b006d05e.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/audio#main-content
|
| 2 |
+
Title: Audio understanding | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Audio understanding | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Audio understanding Gemini can analyze and understand audio input, enabling use cases like the following: Describe, summarize, or answer questions about audio content. Provide a transcription of the audio. Analyze specific segments of the audio. This guide shows you how to use the Gemini API to generate a text response to audio input. Before you begin Before calling the Gemini API, ensure you have your SDK of choice installed, and a Gemini API key configured and ready to use. Input audio You can provide audio data to Gemini in the following ways: Upload an audio file before making a request to generateContent . Pass inline audio data with the request to generateContent . Upload an audio file You can use the Files API to upload an audio file. Always use the Files API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20 MB. The following code uploads an audio file and then uses the file in a call to generateContent . Python from google import genai client = genai . Client () myfile = client . files . upload ( file = "path/to/sample.mp3" ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ "Describe this audio clip" , myfile ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); async function main () { const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mp3" }, }); const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Describe this audio clip" , ]), }); console . log ( response . text ); } await main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromText ( "Describe this audio clip" ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST AUDIO_PATH = "path/to/sample.mp3" MIME_TYPE = $( file -b --mime-type " ${ AUDIO_PATH } " ) NUM_BYTES = $( wc -c < " ${ AUDIO_PATH } " ) DISPLAY_NAME = AUDIO tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D upload-header.tmp \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ AUDIO_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "Describe this audio clip"}, {"file_data":{"mime_type": "${MIME_TYPE}", "file_uri": ' $file_uri '}}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json To learn more about working with media files, see Files API . Pass audio data inline Instead of uploading an audio file, you can pass inline audio data in the request to generateContent : Python from google.genai import types with open ( 'path/to/small-sample.mp3' , 'rb' ) as f : audio_bytes = f . read () response = client . models . generate_content ( model = 'gemini-2.5-flash' , contents = [ 'Describe this audio clip' , types . Part . from_bytes ( data = audio_bytes , mime_type = 'audio/mp3' , ) ] ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; import * as fs from "node:fs" ; const ai = new GoogleGenAI ({}); const base64AudioFile = fs . readFileSync ( "path/to/small-sample.mp3" , { encoding : "base64" , }); const contents = [ { text : "Please summarize the audio." }, { inlineData : { mimeType : "audio/mp3" , data : base64AudioFile , }, }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : contents , }); console . log ( response . text ); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } audioBytes , _ := os . ReadFile ( "/path/to/small-sample.mp3" ) parts := [] * genai . Part { genai . NewPartFromText ( "Describe this audio clip" ), & genai . Part { InlineData : & genai . Blob { MIMEType : "audio/mp3" , Data : audioBytes , }, }, } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } A few things to keep in mind about inline audio data: The maximum request size is 20 MB, which includes text prompts, system instructions, and files provided inline. If your file's size will make the total request size exceed 20 MB, then use the Files API to upload an audio file for use in the request. If you're using an audio sample multiple times, it's more efficient to upload an audio file . Get a transcript To get a transcript of audio data, just ask for it in the prompt: Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) prompt = 'Generate a transcript of the speech.' response = client . models . generate_content ( model = 'gemini-2.5-flash' , contents = [ prompt , myfile ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Generate a transcript of the speech." , ]), }); console . log ( "result.text=" , result . text ); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromText ( "Generate a transcript of the speech." ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } Refer to timestamps You can refer to specific sections of an audio file using timestamps of the form MM:SS . For example, the following prompt requests a transcript that Starts at 2 minutes 30 seconds from the beginning of the file. Ends at 3 minutes 29 seconds from the beginning of the file. Python # Create a prompt containing timestamps. prompt = "Provide a transcript of the speech from 02:30 to 03:29." JavaScript // Create a prompt containing timestamps. const prompt = "Provide a transcript of the speech from 02:30 to 03:29." Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromText ( "Provide a transcript of the speech " + "between the timestamps 02:30 and 03:29." ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } Count tokens Call the countTokens method to get a count of the number of tokens in an audio file. For example: Python response = client . models . count_tokens ( model = 'gemini-2.5-flash' , contents = [ myfile ] ) print ( response ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const countTokensResponse = await ai . models . countTokens ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), ]), }); console . log ( countTokensResponse . totalTokens ); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } tokens , _ := client . Models . CountTokens ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Printf ( "File %s is %d tokens\n" , localAudioPath , tokens . TotalTokens ) } Supported audio formats Gemini supports the following audio format MIME types: WAV - audio/wav MP3 - audio/mp3 AIFF - audio/aiff AAC - audio/aac OGG Vorbis - audio/ogg FLAC - audio/flac Technical details about audio Gemini represents each second of audio as 32 tokens; for example, one minute of audio is represented as 1,920 tokens. Gemini can "understand" non-speech components, such as birdsong or sirens. The maximum supported length of audio data in a single prompt is 9.5 hours. Gemini doesn't limit the number of audio files in a single prompt; however, the total combined length of all audio files in a single prompt can't exceed 9.5 hours. Gemini downsamples audio files to a 16 Kbps data resolution. If the audio source contains multiple channels, Gemini combines those channels into a single channel. What's next This guide shows how to generate text in response to audio data. To learn more, see the following resources: File prompting strategies : The Gemini API supports prompting with text, image, audio, and video data, also known as multimodal prompting. System instructions : System instructions let you steer the behavior of the model based on your specific needs and use cases. Safety guidance : Sometimes generative AI models produce unexpected outputs, such as outputs that are inaccurate, biased, or offensive. Post-processing and human evaluation are essential to limit the risk of harm from such outputs. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
|
text_content/docs_audio_ce171fe5.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/audio#upload-audio
|
| 2 |
+
Title: Audio understanding | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Audio understanding | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Audio understanding Gemini can analyze and understand audio input, enabling use cases like the following: Describe, summarize, or answer questions about audio content. Provide a transcription of the audio. Analyze specific segments of the audio. This guide shows you how to use the Gemini API to generate a text response to audio input. Before you begin Before calling the Gemini API, ensure you have your SDK of choice installed, and a Gemini API key configured and ready to use. Input audio You can provide audio data to Gemini in the following ways: Upload an audio file before making a request to generateContent . Pass inline audio data with the request to generateContent . Upload an audio file You can use the Files API to upload an audio file. Always use the Files API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20 MB. The following code uploads an audio file and then uses the file in a call to generateContent . Python from google import genai client = genai . Client () myfile = client . files . upload ( file = "path/to/sample.mp3" ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ "Describe this audio clip" , myfile ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); async function main () { const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mp3" }, }); const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Describe this audio clip" , ]), }); console . log ( response . text ); } await main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromText ( "Describe this audio clip" ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST AUDIO_PATH = "path/to/sample.mp3" MIME_TYPE = $( file -b --mime-type " ${ AUDIO_PATH } " ) NUM_BYTES = $( wc -c < " ${ AUDIO_PATH } " ) DISPLAY_NAME = AUDIO tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D upload-header.tmp \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ AUDIO_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "Describe this audio clip"}, {"file_data":{"mime_type": "${MIME_TYPE}", "file_uri": ' $file_uri '}}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json To learn more about working with media files, see Files API . Pass audio data inline Instead of uploading an audio file, you can pass inline audio data in the request to generateContent : Python from google.genai import types with open ( 'path/to/small-sample.mp3' , 'rb' ) as f : audio_bytes = f . read () response = client . models . generate_content ( model = 'gemini-2.5-flash' , contents = [ 'Describe this audio clip' , types . Part . from_bytes ( data = audio_bytes , mime_type = 'audio/mp3' , ) ] ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; import * as fs from "node:fs" ; const ai = new GoogleGenAI ({}); const base64AudioFile = fs . readFileSync ( "path/to/small-sample.mp3" , { encoding : "base64" , }); const contents = [ { text : "Please summarize the audio." }, { inlineData : { mimeType : "audio/mp3" , data : base64AudioFile , }, }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : contents , }); console . log ( response . text ); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } audioBytes , _ := os . ReadFile ( "/path/to/small-sample.mp3" ) parts := [] * genai . Part { genai . NewPartFromText ( "Describe this audio clip" ), & genai . Part { InlineData : & genai . Blob { MIMEType : "audio/mp3" , Data : audioBytes , }, }, } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } A few things to keep in mind about inline audio data: The maximum request size is 20 MB, which includes text prompts, system instructions, and files provided inline. If your file's size will make the total request size exceed 20 MB, then use the Files API to upload an audio file for use in the request. If you're using an audio sample multiple times, it's more efficient to upload an audio file . Get a transcript To get a transcript of audio data, just ask for it in the prompt: Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) prompt = 'Generate a transcript of the speech.' response = client . models . generate_content ( model = 'gemini-2.5-flash' , contents = [ prompt , myfile ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Generate a transcript of the speech." , ]), }); console . log ( "result.text=" , result . text ); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromText ( "Generate a transcript of the speech." ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } Refer to timestamps You can refer to specific sections of an audio file using timestamps of the form MM:SS . For example, the following prompt requests a transcript that Starts at 2 minutes 30 seconds from the beginning of the file. Ends at 3 minutes 29 seconds from the beginning of the file. Python # Create a prompt containing timestamps. prompt = "Provide a transcript of the speech from 02:30 to 03:29." JavaScript // Create a prompt containing timestamps. const prompt = "Provide a transcript of the speech from 02:30 to 03:29." Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromText ( "Provide a transcript of the speech " + "between the timestamps 02:30 and 03:29." ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } Count tokens Call the countTokens method to get a count of the number of tokens in an audio file. For example: Python response = client . models . count_tokens ( model = 'gemini-2.5-flash' , contents = [ myfile ] ) print ( response ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const countTokensResponse = await ai . models . countTokens ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), ]), }); console . log ( countTokensResponse . totalTokens ); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } tokens , _ := client . Models . CountTokens ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Printf ( "File %s is %d tokens\n" , localAudioPath , tokens . TotalTokens ) } Supported audio formats Gemini supports the following audio format MIME types: WAV - audio/wav MP3 - audio/mp3 AIFF - audio/aiff AAC - audio/aac OGG Vorbis - audio/ogg FLAC - audio/flac Technical details about audio Gemini represents each second of audio as 32 tokens; for example, one minute of audio is represented as 1,920 tokens. Gemini can "understand" non-speech components, such as birdsong or sirens. The maximum supported length of audio data in a single prompt is 9.5 hours. Gemini doesn't limit the number of audio files in a single prompt; however, the total combined length of all audio files in a single prompt can't exceed 9.5 hours. Gemini downsamples audio files to a 16 Kbps data resolution. If the audio source contains multiple channels, Gemini combines those channels into a single channel. What's next This guide shows how to generate text in response to audio data. To learn more, see the following resources: File prompting strategies : The Gemini API supports prompting with text, image, audio, and video data, also known as multimodal prompting. System instructions : System instructions let you steer the behavior of the model based on your specific needs and use cases. Safety guidance : Sometimes generative AI models produce unexpected outputs, such as outputs that are inaccurate, biased, or offensive. Post-processing and human evaluation are essential to limit the risk of harm from such outputs. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
|
text_content/docs_available-regions_eddec083.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/available-regions#main-content
|
| 2 |
+
Title: Available regions for Google AI Studio and Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Available regions for Google AI Studio and Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Available regions for Google AI Studio and Gemini API If you reached this page after trying to open Google AI Studio , it may be because Google AI Studio is not available in your region, or you don't meet the age requirements (18+) for access. You can learn more about the available regions in the following section and other requirements in the terms of service . Available regions Note: For Colab users - Region restrictions are applied based on the region that the Colab instance is in, not the region that the user is in. You can check the location of the Colab instance using !curl ipinfo.io The Gemini API and Google AI Studio are available in the following countries and territories. If you're not in one of these countries or territories, try the Gemini API in Vertex AI : Albania Algeria American Samoa Angola Anguilla Antarctica Antigua and Barbuda Argentina Armenia Aruba Australia Austria Azerbaijan The Bahamas Bahrain Bangladesh Barbados Belgium Belize Benin Bermuda Bhutan Bolivia Bosnia Botswana Brazil British Indian Ocean Territory British Virgin Islands Brunei Bulgaria Burkina Faso Burundi Cabo Verde Cambodia Cameroon Canada Caribbean Netherlands Cayman Islands Central African Republic Chad Chile Christmas Island Cocos (Keeling) Islands Colombia Comoros Cook Islands Côte d'Ivoire Costa Rica Croatia Curaçao Czech Republic Democratic Republic of the Congo Denmark Djibouti Dominica Dominican Republic Ecuador Egypt El Salvador Equatorial Guinea Eritrea Estonia Eswatini Ethiopia Falkland Islands (Islas Malvinas) Faroe Islands Fiji Finland France Gabon The Gambia Georgia Germany Ghana Gibraltar Greece Greenland Grenada Guam Guatemala Guernsey Guinea Guinea-Bissau Guyana Haiti Heard Island and McDonald Islands Herzegovina Honduras Hungary Iceland India Indonesia Iraq Ireland Isle of Man Israel Italy Jamaica Japan Jersey Jordan Kazakhstan Kenya Kiribati Kosovo Kyrgyzstan Kuwait Laos Latvia Lebanon Lesotho Liberia Libya Liechtenstein Lithuania Luxembourg Madagascar Malawi Malaysia Maldives Mali Malta Marshall Islands Mauritania Mauritius Mexico Micronesia Mongolia Montenegro Montserrat Morocco Mozambique Namibia Nauru Nepal Netherlands New Caledonia New Zealand Nicaragua Niger Nigeria Niue Norfolk Island North Macedonia Northern Mariana Islands Norway Oman Pakistan Palau Palestine Panama Papua New Guinea Paraguay Peru Philippines Pitcairn Islands Poland Portugal Puerto Rico Qatar Republic of Cyprus Republic of the Congo Romania Rwanda Saint Barthélemy Saint Kitts and Nevis Saint Lucia Saint Pierre and Miquelon Saint Vincent and the Grenadines Saint Helena, Ascension and Tristan da Cunha Samoa São Tomé and Príncipe Saudi Arabia Senegal Serbia Seychelles Sierra Leone Singapore Slovakia Slovenia Solomon Islands Somalia South Africa South Georgia and the South Sandwich Islands South Korea South Sudan Spain Sri Lanka Sudan Suriname Sweden Switzerland Taiwan Tajikistan Tanzania Thailand Timor-Leste Togo Tokelau Tonga Trinidad and Tobago Tunisia Türkiye Turkmenistan Turks and Caicos Islands Tuvalu Uganda Ukraine United Kingdom United Arab Emirates United States United States Minor Outlying Islands U.S. Virgin Islands Uruguay Uzbekistan Vanuatu Venezuela Vietnam Wallis and Futuna Western Sahara Yemen Zambia Zimbabwe Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-02-25 UTC.
|
text_content/docs_batch-mode_18ed6717.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/batch-mode#main-content
|
| 2 |
+
Title: Batch Mode | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Batch Mode | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Batch Mode The Gemini API's Batch Mode is designed to process large volumes of requests asynchronously at 50% of the standard cost . The target turnaround time is 24 hours, but in majority of cases, it is much quicker. Use Batch Mode for large-scale, non-urgent tasks such as data pre-processing or running evaluations where an immediate response is not required. Note: You can use Batch Mode with the Gemini API Python SDK or the REST API. Support for Batch Mode in the Gemini API JavaScript SDK is coming soon. Getting Started This section helps you get started with submitting your first requests in batch mode. Creating a batch job You have two ways to submit your requests in Batch Mode: Inline Requests : A list of GenerateContentRequest objects directly included in your batch creation request. This is suitable for smaller batches that keep the total request size under 20MB. The output returned from the model is a list of inlineResponse objects. Input File : A JSON Lines (JSONL) file where each line contains a complete GenerateContentRequest object. This method is recommended for larger requests. The output returned from the model is a JSONL file where each line is either a GenerateContentResponse or a status object. Inline requests For a small number of requests, you can directly embed the GenerateContentRequest objects within your BatchGenerateContentRequest . The following example calls the BatchGenerateContent method with inline requests: Python from google import genai from google.genai import types client = genai . Client () # A list of dictionaries, where each is a GenerateContentRequest inline_requests = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Tell me a one-sentence joke.' }], 'role' : 'user' }] }, { 'contents' : [{ 'parts' : [{ 'text' : 'Why is the sky blue?' }], 'role' : 'user' }] } ] inline_batch_job = client . batches . create ( model = "models/gemini-2.5-flash" , src = inline_requests , config = { 'display_name' : "inlined-requests-job-1" , }, ) print ( f "Created batch job: { inline_batch_job . name } " ) REST curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -X POST \ -H "Content-Type:application/json" \ -d '{ "batch": { "display_name": "my-batch-requests", "input_config": { "requests": { "requests": [ { "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]}, "metadata": { "key": "request-1" } }, { "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]}, "metadata": { "key": "request-2" } } ] } } } }' You can use any requests you would use in non-batch (or interactive) mode. For example, you could specify the temperature, system instructions or even pass in other modalities. The following example shows some example inline requests that contain a system instruction for one of the requests: inline_requests_list = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Write a short poem about a cloud.' }]}]}, { 'contents' : [{ 'parts' : [{ 'text' : 'Write a short poem about a cat.' }]}], 'system_instructions' : { 'parts' : [{ 'text' : 'You are a cat. Your name is Neko.' }]}} ] Similarly can also specify tools to use for a request. The following example shows a request that enables the Google Search tool : inline_requests_list = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Who won the euro 1998?' }]}]}, { 'contents' : [{ 'parts' : [{ 'text' : 'Who won the euro 2025?' }]}], 'tools' : [{ 'google_search ' : {}}]} ] Input file For larger sets of requests, prepare a JSON Lines (JSONL) file. Each line in this file must be a JSON object containing a user-defined key and a request object, where the request is a valid GenerateContentRequest object. The user-defined key is used in the response to indicate which output is the result of which request. For example, the request with the key defined as request-1 will have its response annotated with the same key name. This file is uploaded using the File API . The maximum allowed file size for an input file is 2GB. The following is an example of a JSONL file. You can save it in a file named my-batch-requests.json : { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }]}], "generation_config" : { "temperature" : 0.7 }}} { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }]}]}} Similarly to inline requests, you can specify other parameters like system instructions, tools or other configurations in each request JSON. You can upload this file using the File API as shown in the following example. If you are working with multimodal input, you can reference other uploaded files within your JSONL file. Python from google import genai from google.genai import types client = genai . Client () # Create a sample JSONL file with open ( "my-batch-requests.jsonl" , "w" ) as f : requests = [ { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }]}]}}, { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }]}]}} ] for req in requests : f . write ( json . dumps ( req ) + " \n " ) # Upload the file to the File API uploaded_file = client . files . upload ( file = 'my-batch-requests.jsonl' , config = types . UploadFileConfig ( display_name = 'my-batch-requests' , mime_type = 'jsonl' ) ) print ( f "Uploaded file: { uploaded_file . name } " ) REST tmp_batch_input_file = batch_input.tmp echo -e '{"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}], "generationConfig": {"temperature": 0.7}}\n{"contents": [{"parts": [{"text": "What are the main ingredients in a Margherita pizza?"}]}]}' > batch_input.tmp MIME_TYPE = $( file -b --mime-type " ${ tmp_batch_input_file } " ) NUM_BYTES = $( wc -c < " ${ tmp_batch_input_file } " ) DISPLAY_NAME = BatchInput tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "https://generativelanguage.googleapis.com/upload/v1beta/files \ -D " ${ tmp_header_file } " \ -H " x-goog-api-key: $GEMINI_API_KEY " \ -H " X-Goog-Upload-Protocol: resumable " \ -H " X-Goog-Upload-Command: start " \ -H " X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H " X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H " Content-Type: application/jsonl " \ -d " { 'file' : { 'display_name' : '${DISPLAY_NAME}' }} " 2> /dev/null upload_url= $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H " Content-Length: ${ NUM_BYTES } " \ -H " X-Goog-Upload-Offset: 0 " \ -H " X-Goog-Upload-Command: upload, finalize " \ --data-binary " @ ${ tmp_batch_input_file } " 2> /dev/null > file_info.json file_uri= $( jq ".file.uri" file_info.json ) The following example calls the BatchGenerateContent method with the input file uploaded using File API: Python # Assumes `uploaded_file` is the file object from the previous step file_batch_job = client . batches . create ( model = "gemini-2.5-flash" , src = uploaded_file . name , config = { 'display_name' : "file-upload-job-1" , }, ) print ( f "Created batch job: { file_batch_job . name } " ) REST BATCH_INPUT_FILE = 'files/123456' # File ID curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \ -X POST \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" \ -d "{ 'batch': { 'display_name': 'my-batch-requests', 'input_config': { 'requests': { 'file_name': ${ BATCH_INPUT_FILE } } } } }" When you create a batch job, you will get a job name returned. Use this name for monitoring the job status as well as retrieving the results once the job completes. The following is an example output that contains a job name: Created batch job from file: batches/123456789 Monitoring job status Use the operation name obtained when creating the batch job to poll its status. The state field of the batch job will indicate its current status. A batch job can be in one of the following states: JOB_STATE_PENDING : The job has been created and is waiting to be processed by the service. JOB_STATE_SUCCEEDED : The job completed successfully. You can now retrieve the results. JOB_STATE_FAILED : The job failed. Check the error details for more information. JOB_STATE_CANCELLED : The job was cancelled by the user. You can poll the job status periodically to check for completion. Python # Use the name of the job you want to check # e.g., inline_batch_job.name from the previous step job_name = "YOUR_BATCH_JOB_NAME" # (e.g. 'batches/your-batch-id') batch_job = client . batches . get ( name = job_name ) completed_states = set ([ 'JOB_STATE_SUCCEEDED' , 'JOB_STATE_FAILED' , 'JOB_STATE_CANCELLED' , ]) print ( f "Polling status for job: { job_name } " ) batch_job = client . batches . get ( name = job_name ) # Initial get while batch_job . state . name not in completed_states : print ( f "Current state: { batch_job . state . name } " ) time . sleep ( 30 ) # Wait for 30 seconds before polling again batch_job = client . batches . get ( name = job_name ) print ( f "Job finished with state: { batch_job . state . name } " ) if batch_job . state . name == 'JOB_STATE_FAILED' : print ( f "Error: { batch_job . error } " ) Retrieving results Once the job status indicates your batch job has succeeded, the results are available in the response field. Python import json # Use the name of the job you want to check # e.g., inline_batch_job.name from the previous step job_name = "YOUR_BATCH_JOB_NAME" batch_job = client . batches . get ( name = job_name ) if batch_job . state . name == 'JOB_STATE_SUCCEEDED' : # If batch job was created with a file if batch_job . dest and batch_job . dest . file_name : # Results are in a file result_file_name = batch_job . dest . file_name print ( f "Results are in file: { result_file_name } " ) print ( "Downloading result file content..." ) file_content = client . files . download ( file = result_file_name ) # Process file_content (bytes) as needed print ( file_content . decode ( 'utf-8' )) # If batch job was created with inline request elif batch_job . dest and batch_job . dest . inlined_responses : # Results are inline print ( "Results are inline:" ) for i , inline_response in enumerate ( batch_job . dest . inlined_responses ): print ( f "Response { i + 1 } :" ) if inline_response . response : # Accessing response, structure may vary. try : print ( inline_response . response . text ) except AttributeError : print ( inline_response . response ) # Fallback elif inline_response . error : print ( f "Error: { inline_response . error } " ) else : print ( "No results found (neither file nor inline)." ) else : print ( f "Job did not succeed. Final state: { batch_job . state . name } " ) if batch_job . error : print ( f "Error: { batch_job . error } " ) REST BATCH_NAME = "batches/123456" # Your batch job name curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" 2 > /dev/null > batch_status.json if jq -r '.done' batch_status.json | grep -q "false" ; then echo "Batch has not finished processing" fi batch_state = $( jq -r '.metadata.state' batch_status.json ) if [[ $batch_state = "JOB_STATE_SUCCEEDED" ]] ; then if [[ $( jq '.response | has("inlinedResponses")' batch_status.json ) = "true" ]] ; then jq -r '.response.inlinedResponses' batch_status.json exit fi responses_file_name = $( jq -r '.response.responsesFile' batch_status.json ) curl https://generativelanguage.googleapis.com/download/v1beta/ $responses_file_name :download?alt = media \ -H "x-goog-api-key: $GEMINI_API_KEY " 2 > /dev/null elif [[ $batch_state = "JOB_STATE_FAILED" ]] ; then jq '.error' batch_status.json elif [[ $batch_state == "JOB_STATE_CANCELLED" ]] ; then echo "Batch was cancelled by the user" fi Cancelling a batch job You can cancel an ongoing batch job using its name. When a job is canceled, it stops processing new requests. Python # Cancel a batch job client . batches . cancel ( name = batch_job_to_cancel . name ) REST BATCH_NAME = "batches/123456" # Your batch job name # Cancel the batch curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME :cancel \ -H "x-goog-api-key: $GEMINI_API_KEY " \ # Confirm that the status of the batch after cancellation is JOB_STATE_CANCELLED curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" 2 > /dev/null | jq -r '.metadata.state' Deleting a batch job You can delete an existing batch job using its name. When a job is deleted, it stops processing new requests and is removed from the list of batch jobs. Python # Delete a batch job client . batches . delete ( name = batch_job_to_delete . name ) REST BATCH_NAME = "batches/123456" # Your batch job name # Cancel the batch curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME :delete \ -H "x-goog-api-key: $GEMINI_API_KEY " \ Technical details Supported models: Batch Mode supports a range of Gemini models. Refer to the Models page for the latest list of compatible models. The supported modalities for Batch Mode are the same as what's supported on the interactive (or non-batch mode) API. Pricing: Batch Mode usage is priced at 50% of the standard interactive API cost for the equivalent model. Service Level Objective (SLO): Batch jobs are designed to complete within a 24-hour turnaround time. Many jobs may complete much faster depending on their size and current system load. Caching: Context caching is enabled for batch requests. If a request in your batch results in a cache hit, the cached tokens are priced the same as for non-batch mode traffic. Best practices Use input files for large requests: For a large number of requests, always use the file input method for better manageability and to avoid hitting request size limits for the BatchGenerateContent call itself. Note that there's a the 2GB file size limit per input file. Error handling: Check the batchStats for failedRequestCount after a job completes. If using file output, parse each line to check if it's a GenerateContentResponse or a status object indicating an error for that specific request. Submit jobs once: The creation of a batch job is not idempotent. If you send the same creation request twice, two separate batch jobs will be created. Break up very large batches: While the target turnaround time is 24 hours, actual processing time can vary based on system load and job size. For large jobs, consider breaking them into smaller batches if intermediate results are needed sooner. What's next Check out the batch mode notebook for more examples. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-07 UTC.
|
text_content/docs_batch-mode_dd3e59d0.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/batch-mode#retrieve-batch-results
|
| 2 |
+
Title: Batch Mode | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Batch Mode | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Batch Mode The Gemini API's Batch Mode is designed to process large volumes of requests asynchronously at 50% of the standard cost . The target turnaround time is 24 hours, but in majority of cases, it is much quicker. Use Batch Mode for large-scale, non-urgent tasks such as data pre-processing or running evaluations where an immediate response is not required. Note: You can use Batch Mode with the Gemini API Python SDK or the REST API. Support for Batch Mode in the Gemini API JavaScript SDK is coming soon. Getting Started This section helps you get started with submitting your first requests in batch mode. Creating a batch job You have two ways to submit your requests in Batch Mode: Inline Requests : A list of GenerateContentRequest objects directly included in your batch creation request. This is suitable for smaller batches that keep the total request size under 20MB. The output returned from the model is a list of inlineResponse objects. Input File : A JSON Lines (JSONL) file where each line contains a complete GenerateContentRequest object. This method is recommended for larger requests. The output returned from the model is a JSONL file where each line is either a GenerateContentResponse or a status object. Inline requests For a small number of requests, you can directly embed the GenerateContentRequest objects within your BatchGenerateContentRequest . The following example calls the BatchGenerateContent method with inline requests: Python from google import genai from google.genai import types client = genai . Client () # A list of dictionaries, where each is a GenerateContentRequest inline_requests = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Tell me a one-sentence joke.' }], 'role' : 'user' }] }, { 'contents' : [{ 'parts' : [{ 'text' : 'Why is the sky blue?' }], 'role' : 'user' }] } ] inline_batch_job = client . batches . create ( model = "models/gemini-2.5-flash" , src = inline_requests , config = { 'display_name' : "inlined-requests-job-1" , }, ) print ( f "Created batch job: { inline_batch_job . name } " ) REST curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -X POST \ -H "Content-Type:application/json" \ -d '{ "batch": { "display_name": "my-batch-requests", "input_config": { "requests": { "requests": [ { "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]}, "metadata": { "key": "request-1" } }, { "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]}, "metadata": { "key": "request-2" } } ] } } } }' You can use any requests you would use in non-batch (or interactive) mode. For example, you could specify the temperature, system instructions or even pass in other modalities. The following example shows some example inline requests that contain a system instruction for one of the requests: inline_requests_list = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Write a short poem about a cloud.' }]}]}, { 'contents' : [{ 'parts' : [{ 'text' : 'Write a short poem about a cat.' }]}], 'system_instructions' : { 'parts' : [{ 'text' : 'You are a cat. Your name is Neko.' }]}} ] Similarly can also specify tools to use for a request. The following example shows a request that enables the Google Search tool : inline_requests_list = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Who won the euro 1998?' }]}]}, { 'contents' : [{ 'parts' : [{ 'text' : 'Who won the euro 2025?' }]}], 'tools' : [{ 'google_search ' : {}}]} ] Input file For larger sets of requests, prepare a JSON Lines (JSONL) file. Each line in this file must be a JSON object containing a user-defined key and a request object, where the request is a valid GenerateContentRequest object. The user-defined key is used in the response to indicate which output is the result of which request. For example, the request with the key defined as request-1 will have its response annotated with the same key name. This file is uploaded using the File API . The maximum allowed file size for an input file is 2GB. The following is an example of a JSONL file. You can save it in a file named my-batch-requests.json : { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }]}], "generation_config" : { "temperature" : 0.7 }}} { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }]}]}} Similarly to inline requests, you can specify other parameters like system instructions, tools or other configurations in each request JSON. You can upload this file using the File API as shown in the following example. If you are working with multimodal input, you can reference other uploaded files within your JSONL file. Python from google import genai from google.genai import types client = genai . Client () # Create a sample JSONL file with open ( "my-batch-requests.jsonl" , "w" ) as f : requests = [ { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }]}]}}, { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }]}]}} ] for req in requests : f . write ( json . dumps ( req ) + " \n " ) # Upload the file to the File API uploaded_file = client . files . upload ( file = 'my-batch-requests.jsonl' , config = types . UploadFileConfig ( display_name = 'my-batch-requests' , mime_type = 'jsonl' ) ) print ( f "Uploaded file: { uploaded_file . name } " ) REST tmp_batch_input_file = batch_input.tmp echo -e '{"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}], "generationConfig": {"temperature": 0.7}}\n{"contents": [{"parts": [{"text": "What are the main ingredients in a Margherita pizza?"}]}]}' > batch_input.tmp MIME_TYPE = $( file -b --mime-type " ${ tmp_batch_input_file } " ) NUM_BYTES = $( wc -c < " ${ tmp_batch_input_file } " ) DISPLAY_NAME = BatchInput tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "https://generativelanguage.googleapis.com/upload/v1beta/files \ -D " ${ tmp_header_file } " \ -H " x-goog-api-key: $GEMINI_API_KEY " \ -H " X-Goog-Upload-Protocol: resumable " \ -H " X-Goog-Upload-Command: start " \ -H " X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H " X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H " Content-Type: application/jsonl " \ -d " { 'file' : { 'display_name' : '${DISPLAY_NAME}' }} " 2> /dev/null upload_url= $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H " Content-Length: ${ NUM_BYTES } " \ -H " X-Goog-Upload-Offset: 0 " \ -H " X-Goog-Upload-Command: upload, finalize " \ --data-binary " @ ${ tmp_batch_input_file } " 2> /dev/null > file_info.json file_uri= $( jq ".file.uri" file_info.json ) The following example calls the BatchGenerateContent method with the input file uploaded using File API: Python # Assumes `uploaded_file` is the file object from the previous step file_batch_job = client . batches . create ( model = "gemini-2.5-flash" , src = uploaded_file . name , config = { 'display_name' : "file-upload-job-1" , }, ) print ( f "Created batch job: { file_batch_job . name } " ) REST BATCH_INPUT_FILE = 'files/123456' # File ID curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \ -X POST \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" \ -d "{ 'batch': { 'display_name': 'my-batch-requests', 'input_config': { 'requests': { 'file_name': ${ BATCH_INPUT_FILE } } } } }" When you create a batch job, you will get a job name returned. Use this name for monitoring the job status as well as retrieving the results once the job completes. The following is an example output that contains a job name: Created batch job from file: batches/123456789 Monitoring job status Use the operation name obtained when creating the batch job to poll its status. The state field of the batch job will indicate its current status. A batch job can be in one of the following states: JOB_STATE_PENDING : The job has been created and is waiting to be processed by the service. JOB_STATE_SUCCEEDED : The job completed successfully. You can now retrieve the results. JOB_STATE_FAILED : The job failed. Check the error details for more information. JOB_STATE_CANCELLED : The job was cancelled by the user. You can poll the job status periodically to check for completion. Python # Use the name of the job you want to check # e.g., inline_batch_job.name from the previous step job_name = "YOUR_BATCH_JOB_NAME" # (e.g. 'batches/your-batch-id') batch_job = client . batches . get ( name = job_name ) completed_states = set ([ 'JOB_STATE_SUCCEEDED' , 'JOB_STATE_FAILED' , 'JOB_STATE_CANCELLED' , ]) print ( f "Polling status for job: { job_name } " ) batch_job = client . batches . get ( name = job_name ) # Initial get while batch_job . state . name not in completed_states : print ( f "Current state: { batch_job . state . name } " ) time . sleep ( 30 ) # Wait for 30 seconds before polling again batch_job = client . batches . get ( name = job_name ) print ( f "Job finished with state: { batch_job . state . name } " ) if batch_job . state . name == 'JOB_STATE_FAILED' : print ( f "Error: { batch_job . error } " ) Retrieving results Once the job status indicates your batch job has succeeded, the results are available in the response field. Python import json # Use the name of the job you want to check # e.g., inline_batch_job.name from the previous step job_name = "YOUR_BATCH_JOB_NAME" batch_job = client . batches . get ( name = job_name ) if batch_job . state . name == 'JOB_STATE_SUCCEEDED' : # If batch job was created with a file if batch_job . dest and batch_job . dest . file_name : # Results are in a file result_file_name = batch_job . dest . file_name print ( f "Results are in file: { result_file_name } " ) print ( "Downloading result file content..." ) file_content = client . files . download ( file = result_file_name ) # Process file_content (bytes) as needed print ( file_content . decode ( 'utf-8' )) # If batch job was created with inline request elif batch_job . dest and batch_job . dest . inlined_responses : # Results are inline print ( "Results are inline:" ) for i , inline_response in enumerate ( batch_job . dest . inlined_responses ): print ( f "Response { i + 1 } :" ) if inline_response . response : # Accessing response, structure may vary. try : print ( inline_response . response . text ) except AttributeError : print ( inline_response . response ) # Fallback elif inline_response . error : print ( f "Error: { inline_response . error } " ) else : print ( "No results found (neither file nor inline)." ) else : print ( f "Job did not succeed. Final state: { batch_job . state . name } " ) if batch_job . error : print ( f "Error: { batch_job . error } " ) REST BATCH_NAME = "batches/123456" # Your batch job name curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" 2 > /dev/null > batch_status.json if jq -r '.done' batch_status.json | grep -q "false" ; then echo "Batch has not finished processing" fi batch_state = $( jq -r '.metadata.state' batch_status.json ) if [[ $batch_state = "JOB_STATE_SUCCEEDED" ]] ; then if [[ $( jq '.response | has("inlinedResponses")' batch_status.json ) = "true" ]] ; then jq -r '.response.inlinedResponses' batch_status.json exit fi responses_file_name = $( jq -r '.response.responsesFile' batch_status.json ) curl https://generativelanguage.googleapis.com/download/v1beta/ $responses_file_name :download?alt = media \ -H "x-goog-api-key: $GEMINI_API_KEY " 2 > /dev/null elif [[ $batch_state = "JOB_STATE_FAILED" ]] ; then jq '.error' batch_status.json elif [[ $batch_state == "JOB_STATE_CANCELLED" ]] ; then echo "Batch was cancelled by the user" fi Cancelling a batch job You can cancel an ongoing batch job using its name. When a job is canceled, it stops processing new requests. Python # Cancel a batch job client . batches . cancel ( name = batch_job_to_cancel . name ) REST BATCH_NAME = "batches/123456" # Your batch job name # Cancel the batch curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME :cancel \ -H "x-goog-api-key: $GEMINI_API_KEY " \ # Confirm that the status of the batch after cancellation is JOB_STATE_CANCELLED curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" 2 > /dev/null | jq -r '.metadata.state' Deleting a batch job You can delete an existing batch job using its name. When a job is deleted, it stops processing new requests and is removed from the list of batch jobs. Python # Delete a batch job client . batches . delete ( name = batch_job_to_delete . name ) REST BATCH_NAME = "batches/123456" # Your batch job name # Cancel the batch curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME :delete \ -H "x-goog-api-key: $GEMINI_API_KEY " \ Technical details Supported models: Batch Mode supports a range of Gemini models. Refer to the Models page for the latest list of compatible models. The supported modalities for Batch Mode are the same as what's supported on the interactive (or non-batch mode) API. Pricing: Batch Mode usage is priced at 50% of the standard interactive API cost for the equivalent model. Service Level Objective (SLO): Batch jobs are designed to complete within a 24-hour turnaround time. Many jobs may complete much faster depending on their size and current system load. Caching: Context caching is enabled for batch requests. If a request in your batch results in a cache hit, the cached tokens are priced the same as for non-batch mode traffic. Best practices Use input files for large requests: For a large number of requests, always use the file input method for better manageability and to avoid hitting request size limits for the BatchGenerateContent call itself. Note that there's a the 2GB file size limit per input file. Error handling: Check the batchStats for failedRequestCount after a job completes. If using file output, parse each line to check if it's a GenerateContentResponse or a status object indicating an error for that specific request. Submit jobs once: The creation of a batch job is not idempotent. If you send the same creation request twice, two separate batch jobs will be created. Break up very large batches: While the target turnaround time is 24 hours, actual processing time can vary based on system load and job size. For large jobs, consider breaking them into smaller batches if intermediate results are needed sooner. What's next Check out the batch mode notebook for more examples. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-07 UTC.
|
text_content/docs_billing_c19877aa.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/billing#enable-cloud-billing
|
| 2 |
+
Title: Billing | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Billing | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Billing This guide provides an overview of different Gemini API billing options, explains how to enable billing and monitor usage, and provides answers to frequently asked questions (FAQs) about billing. Upgrade to the Gemini API paid tier About billing Billing for the Gemini API is based on two pricing tiers: free of charge (or free ) and pay-as-you-go (or paid ). Pricing and rate limits differ between these tiers and also vary by model. You can check out the rate limits and pricing pages for more into. For a model-by-model breakdown of capabilities, see the Gemini models page . How to request an upgrade To transition from the free tier to the pay-as-you-go plan, you need to enable billing for your Google Cloud project. The button you see in Google AI Studio depends on your project's current plan. If you're on the free tier, you'll see a Set up Billing button for your project. If you're already on the paid tier and meet the criteria for a plan change, you might see an Upgrade button. To start the process, follow these steps: Go to the AI Studio API keys page . Find the project you want to move to the paid plan and click either Set up Billing or Upgrade , depending on the button displayed. The next step depends on the button you clicked: If you clicked Set up Billing: You'll be redirected to the Google Cloud console to link a billing account to your project. Follow the on-screen instructions to complete the process. If you clicked Upgrade: The system will automatically verify your project's eligibility. If your project meets all the requirements, it will be instantly upgraded to the next tier. Why use the paid tier? When you enable billing and use the paid tier, you benefit from higher rate limits , and your prompts and responses aren't used to improve Google products. For more information on data use for paid services, see the terms of service . Cloud Billing The Gemini API uses Cloud Billing for billing services. To use the paid tier, you must set up Cloud Billing on your cloud project. After you've enabled Cloud Billing, you can use Cloud Billing tools to track spending, understand costs, make payments, and access Cloud Billing support. Enable billing You can enable Cloud Billing starting from Google AI Studio: Open Google AI Studio . In the bottom of the left sidebar, select Settings > Plan information . Click Set up Billing for your chosen project to enable Cloud Billing. Monitor usage After you enable Cloud Billing, you can monitor your usage of the Gemini API in the Google Cloud console . The service name for the API is generativelanguage.googleapis.com , and in the console the Gemini API is also referred to as the Generative Language API . To learn more, see the Google Cloud documentation on monitoring API usage . Frequently asked questions This section provides answers to frequently asked questions. What am I billed for? Gemini API pricing is based on the following: Input token count Output token count Cached token count Cached token storage duration For pricing information, see the pricing page . Where can I view my quota? You can view your quota and system limits in the Google Cloud console . How do I request more quota? To request more quota, follow the instructions at How to request an upgrade . Can I use the Gemini API for free in EEA (including EU), the UK, and CH? Yes, we make the free tier and paid tier available in many regions . If I set up billing with the Gemini API, will I be charged for my Google AI Studio usage? No, Google AI Studio usage remains free of charge regardless of if you set up billing across all supported regions. Can I use 1M tokens in the free tier? The free tier for Gemini API differs based on the model selected. For now, you can try the 1M token context window in the following ways: In Google AI Studio With pay-as-you-go plans With free-of-charge plans for select models See the latest free-of-charge rate limits per model on rate limits page . How can I calculate the number of tokens I'm using? Use the GenerativeModel.count_tokens method to count the number of tokens. Refer to the Tokens guide to learn more about tokens. Can I use my Google Cloud credits with the Gemini API? Yes, Google Cloud credits can be used towards Gemini API usage. How is billing handled? Billing for the Gemini API is handled by the Cloud Billing system. Am I charged for failed requests? If your request fails with a 400 or 500 error, you won't be charged for the tokens used. However, the request will still count against your quota. Is there a charge for fine-tuning the models? Model tuning is free, but inference on tuned models is charged at the same rate as the base models. Is GetTokens billed? Requests to the GetTokens API are not billed, and they don't count against inference quota. How is my Google AI Studio data handled if I have a paid API account? Refer to the terms for details on how data is handled when Cloud billing is enabled (see "How Google Uses Your Data" under "Paid Services"). Note that your Google AI Studio prompts are treated under the same "Paid Services" terms so long as at least 1 API project has billing enabled, which you can validate on the Gemini API Key page if you see any projects marked as "Paid" under "Plan". Where can I get help with billing? To get help with billing, see Get Cloud Billing support . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-05-31 UTC.
|
text_content/docs_caching_eeebb99e.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/caching
|
| 2 |
+
Title: Context caching | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Context caching | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Context caching Python JavaScript Go REST In a typical AI workflow, you might pass the same input tokens over and over to a model. The Gemini API offers two different caching mechanisms: Implicit caching (automatically enabled on Gemini 2.5 models, no cost saving guarantee) Explicit caching (can be manually enabled on most models, cost saving guarantee) Explicit caching is useful in cases where you want to guarantee cost savings, but with some added developer work. Implicit caching Implicit caching is enabled by default for all Gemini 2.5 models. We automatically pass on cost savings if your request hits caches. There is nothing you need to do in order to enable this. It is effective as of May 8th, 2025. The minimum input token count for context caching is 1,024 for 2.5 Flash and 2,048 for 2.5 Pro. To increase the chance of an implicit cache hit: Try putting large and common contents at the beginning of your prompt Try to send requests with similar prefix in a short amount of time You can see the number of tokens which were cache hits in the response object's usage_metadata field. Explicit caching Using the Gemini API explicit caching feature, you can pass some content to the model once, cache the input tokens, and then refer to the cached tokens for subsequent requests. At certain volumes, using cached tokens is lower cost than passing in the same corpus of tokens repeatedly. When you cache a set of tokens, you can choose how long you want the cache to exist before the tokens are automatically deleted. This caching duration is called the time to live (TTL). If not set, the TTL defaults to 1 hour. The cost for caching depends on the input token size and how long you want the tokens to persist. This section assumes that you've installed a Gemini SDK (or have curl installed) and that you've configured an API key, as shown in the quickstart . Explicit caching using the OpenAI library If you're using an OpenAI library , you can enable explicit caching using the cached_content property on extra_body . When to use explicit caching Context caching is particularly well suited to scenarios where a substantial initial context is referenced repeatedly by shorter requests. Consider using context caching for use cases such as: Chatbots with extensive system instructions Repetitive analysis of lengthy video files Recurring queries against large document sets Frequent code repository analysis or bug fixing How explicit caching reduces costs Context caching is a paid feature designed to reduce overall operational costs. Billing is based on the following factors: Cache token count: The number of input tokens cached, billed at a reduced rate when included in subsequent prompts. Storage duration: The amount of time cached tokens are stored (TTL), billed based on the TTL duration of cached token count. There are no minimum or maximum bounds on the TTL. Other factors: Other charges apply, such as for non-cached input tokens and output tokens. For up-to-date pricing details, refer to the Gemini API pricing page . To learn how to count tokens, see the Token guide . Additional considerations Keep the following considerations in mind when using context caching: The minimum input token count for context caching is 1,024 for 2.5 Flash and 2,048 for 2.5 Pro. The maximum is the same as the maximum for the given model. (For more on counting tokens, see the Token guide ). The model doesn't make any distinction between cached tokens and regular input tokens. Cached content is a prefix to the prompt. There are no special rate or usage limits on context caching; the standard rate limits for GenerateContent apply, and token limits include cached tokens. The number of cached tokens is returned in the usage_metadata from the create, get, and list operations of the cache service, and also in GenerateContent when using the cache. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
|
text_content/docs_code-execution_61a8afb4.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/code-execution
|
| 2 |
+
Title: Code execution | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Code execution | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Code execution The Gemini API provides a code execution tool that enables the model to generate and run Python code. The model can then learn iteratively from the code execution results until it arrives at a final output. You can use code execution to build applications that benefit from code-based reasoning. For example, you can use code execution to solve equations or process text. You can also use the libraries included in the code execution environment to perform more specialized tasks. Gemini is only able to execute code in Python. You can still ask Gemini to generate code in another language, but the model can't use the code execution tool to run it. Enable code execution To enable code execution, configure the code execution tool on the model. This allows the model to generate and run code. Python from google import genai from google.genai import types client = genai . Client () response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "What is the sum of the first 50 prime numbers? " "Generate and run code for the calculation, and make sure you get all 50." , config = types . GenerateContentConfig ( tools = [ types . Tool ( code_execution = types . ToolCodeExecution )] ), ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) if part . executable_code is not None : print ( part . executable_code . code ) if part . code_execution_result is not None : print ( part . code_execution_result . output ) JavaScript import { GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({}); let response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : [ "What is the sum of the first 50 prime numbers? " + "Generate and run code for the calculation, and make sure you get all 50." , ], config : { tools : [{ codeExecution : {} }], }, }); const parts = response ? . candidates ? .[ 0 ] ? . content ? . parts || []; parts . forEach (( part ) = > { if ( part . text ) { console . log ( part . text ); } if ( part . executableCode && part . executableCode . code ) { console . log ( part . executableCode . code ); } if ( part . codeExecutionResult && part . codeExecutionResult . output ) { console . log ( part . codeExecutionResult . output ); } }); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } config := & genai . GenerateContentConfig { Tools : [] * genai . Tool { { CodeExecution : & genai . ToolCodeExecution {}}, }, } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , genai . Text ( "What is the sum of the first 50 prime numbers? " + "Generate and run code for the calculation, and make sure you get all 50." ), config , ) fmt . Println ( result . Text ()) fmt . Println ( result . ExecutableCode ()) fmt . Println ( result . CodeExecutionResult ()) } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d ' {"tools": [{"code_execution": {}}], "contents": { "parts": { "text": "What is the sum of the first 50 prime numbers? Generate and run code for the calculation, and make sure you get all 50." } }, }' Note: This REST example doesn't parse the JSON response as shown in the example output. The output might look something like the following, which has been formatted for readability: Okay, I need to calculate the sum of the first 50 prime numbers. Here's how I'll approach this: 1. **Generate Prime Numbers:** I'll use an iterative method to find prime numbers. I'll start with 2 and check if each subsequent number is divisible by any number between 2 and its square root. If not, it's a prime. 2. **Store Primes:** I'll store the prime numbers in a list until I have 50 of them. 3. **Calculate the Sum:** Finally, I'll sum the prime numbers in the list. Here's the Python code to do this: def is_prime(n): """Efficiently checks if a number is prime.""" if n <= 1: return False if n <= 3: return True if n % 2 == 0 or n % 3 == 0: return False i = 5 while i * i <= n: if n % i == 0 or n % (i + 2) == 0: return False i += 6 return True primes = [] num = 2 while len(primes) < 50: if is_prime(num): primes.append(num) num += 1 sum_of_primes = sum(primes) print(f'{primes=}') print(f'{sum_of_primes=}') primes=[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229] sum_of_primes=5117 The sum of the first 50 prime numbers is 5117. This output combines several content parts that the model returns when using code execution: text : Inline text generated by the model executableCode : Code generated by the model that is meant to be executed codeExecutionResult : Result of the executable code The naming conventions for these parts vary by programming language. Use code execution in chat You can also use code execution as part of a chat. Python from google import genai from google.genai import types client = genai . Client () chat = client . chats . create ( model = "gemini-2.5-flash" , config = types . GenerateContentConfig ( tools = [ types . Tool ( code_execution = types . ToolCodeExecution )] ), ) response = chat . send_message ( "I have a math question for you." ) print ( response . text ) response = chat . send_message ( "What is the sum of the first 50 prime numbers? " "Generate and run code for the calculation, and make sure you get all 50." ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) if part . executable_code is not None : print ( part . executable_code . code ) if part . code_execution_result is not None : print ( part . code_execution_result . output ) JavaScript import { GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({}); const chat = ai . chats . create ({ model : "gemini-2.5-flash" , history : [ { role : "user" , parts : [{ text : "I have a math question for you:" }], }, { role : "model" , parts : [{ text : "Great! I'm ready for your math question. Please ask away." }], }, ], config : { tools : [{ codeExecution : {}}], } }); const response = await chat . sendMessage ({ message : "What is the sum of the first 50 prime numbers? " + "Generate and run code for the calculation, and make sure you get all 50." }); console . log ( "Chat response:" , response . text ); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } config := & genai . GenerateContentConfig { Tools : [] * genai . Tool { { CodeExecution : & genai . ToolCodeExecution {}}, }, } chat , _ := client . Chats . Create ( ctx , "gemini-2.5-flash" , config , nil , ) result , _ := chat . SendMessage ( ctx , genai . Part { Text : "What is the sum of the first 50 prime numbers? " + "Generate and run code for the calculation, and " + "make sure you get all 50." , }, ) fmt . Println ( result . Text ()) fmt . Println ( result . ExecutableCode ()) fmt . Println ( result . CodeExecutionResult ()) } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d '{"tools": [{"code_execution": {}}], "contents": [ { "role": "user", "parts": [{ "text": "Can you print \"Hello world!\"?" }] },{ "role": "model", "parts": [ { "text": "" }, { "executable_code": { "language": "PYTHON", "code": "\nprint(\"hello world!\")\n" } }, { "code_execution_result": { "outcome": "OUTCOME_OK", "output": "hello world!\n" } }, { "text": "I have printed \"hello world!\" using the provided python code block. \n" } ], },{ "role": "user", "parts": [{ "text": "What is the sum of the first 50 prime numbers? Generate and run code for the calculation, and make sure you get all 50." }] } ] }' Input/output (I/O) Starting with Gemini 2.0 Flash , code execution supports file input and graph output. Using these input and output capabilities, you can upload CSV and text files, ask questions about the files, and have Matplotlib graphs generated as part of the response. The output files are returned as inline images in the response. I/O pricing When using code execution I/O, you're charged for input tokens and output tokens: Input tokens: User prompt Output tokens: Code generated by the model Code execution output in the code environment Thinking tokens Summary generated by the model I/O details When you're working with code execution I/O, be aware of the following technical details: The maximum runtime of the code environment is 30 seconds. If the code environment generates an error, the model may decide to regenerate the code output. This can happen up to 5 times. The maximum file input size is limited by the model token window. In AI Studio, using Gemini Flash 2.0, the maximum input file size is 1 million tokens (roughly 2MB for text files of the supported input types). If you upload a file that's too large, AI Studio won't let you send it. Code execution works best with text and CSV files. The input file can be passed in part.inlineData or part.fileData (uploaded via the Files API ), and the output file is always returned as part.inlineData . Single turn Bidirectional (Multimodal Live API) Models supported All Gemini 2.0 and 2.5 models Only Flash experimental models File input types supported .png, .jpeg, .csv, .xml, .cpp, .java, .py, .js, .ts .png, .jpeg, .csv, .xml, .cpp, .java, .py, .js, .ts Plotting libraries supported Matplotlib, seaborn Matplotlib, seaborn Multi-tool use Yes (code execution + grounding only) Yes Billing There's no additional charge for enabling code execution from the Gemini API. You'll be billed at the current rate of input and output tokens based on the Gemini model you're using. Here are a few other things to know about billing for code execution: You're only billed once for the input tokens you pass to the model, and you're billed for the final output tokens returned to you by the model. Tokens representing generated code are counted as output tokens. Generated code can include text and multimodal output like images. Code execution results are also counted as output tokens. The billing model is shown in the following diagram: You're billed at the current rate of input and output tokens based on the Gemini model you're using. If Gemini uses code execution when generating your response, the original prompt, the generated code, and the result of the executed code are labeled intermediate tokens and are billed as input tokens . Gemini then generates a summary and returns the generated code, the result of the executed code, and the final summary. These are billed as output tokens . The Gemini API includes an intermediate token count in the API response, so you know why you're getting additional input tokens beyond your initial prompt. Limitations The model can only generate and execute code. It can't return other artifacts like media files. In some cases, enabling code execution can lead to regressions in other areas of model output (for example, writing a story). There is some variation in the ability of the different models to use code execution successfully. Supported libraries The code execution environment includes the following libraries: attrs chess contourpy fpdf geopandas imageio jinja2 joblib jsonschema jsonschema-specifications lxml matplotlib mpmath numpy opencv-python openpyxl packaging pandas pillow protobuf pylatex pyparsing PyPDF2 python-dateutil python-docx python-pptx reportlab scikit-learn scipy seaborn six striprtf sympy tabulate tensorflow toolz xlrd You can't install your own libraries. Note: Only matplotlib is supported for graph rendering using code execution. What's next Try the code execution Colab . Learn about other Gemini API tools: Function calling Grounding with Google Search Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
|
text_content/docs_downloads_a1919083.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/downloads#main-content
|
| 2 |
+
Title: Gemini API libraries | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Gemini API libraries | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Gemini API libraries When building with the Gemini API, we recommend using our official collection of libraries across major languages: the Google GenAI SDK . They are production ready under General Availability . Our samples and documentation across this site are built using these libraries. Note: If you're using one of our legacy libraries, we strongly recommend you migrate to the Google GenAI SDK. Review the legacy libraries section for more information. If you're new to the Gemini API, follow our quickstart guide to get started. Language support and installation The Google GenAI SDK is available for the Python, JavaScript/TypeScript, Go and Java languages. You can install each language's library using package managers, or visit their GitHub repos for further engagement: Python Library: google-genai GitHub Repository: googleapis/python-genai Installation: pip install google-genai JavaScript Library: @google/genai GitHub Repository: googleapis/js-genai Installation: npm install @google/genai Go Library: google.golang.org/genai GitHub Repository: googleapis/go-genai Installation: go get google.golang.org/genai Java Library: google-genai GitHub Repository: googleapis/java-genai Installation: If you're using Maven, add the following to your dependencies: <dependencies> <dependency> <groupId>com.google.genai</groupId> <artifactId>google-genai</artifactId> <version>1.0.0</version> </dependency> </dependencies> General availability We started rolling out the Google GenAI SDK in late 2024. As of May 2025, it reached General Availability (GA) across all supported platforms. This means the libraries are stable and fully supported for production use. They are actively maintained, provide access to the latest features, and offer the best performance working with Gemini. If you're not using the Google GenAI SDK and using one of our legacy libraries, we strongly recommend you to migrate. Review the legacy libraries section for more information. Legacy libraries and migration If you are using one of our legacy libraries, we recommend that you migrate to the new libraries . The legacy libraries don't provide access to recent features (such as Live API and Veo ) and are on a deprecation path. They will stop receiving updates at the end of September 2025, the feature gaps will grow and potential bugs may no longer get fixed. Each legacy library's support status varies, detailed in the following table: Language Legacy library Support status Recommended library Python google-generativeai All support, including bug fixes, ends end of September 2025. google-genai JavaScript/TypeScript @google/generativeai All support, including bug fixes, ends end of September 2025. @google/genai Go google.golang.org/generative-ai All support, including bug fixes, ends end of September 2025. google.golang.org/genai Dart and Flutter google_generative_ai Not actively maintained Use trusted community or third party libraries, like firebase_ai , or access using REST API Swift generative-ai-swift Not actively maintained Use Gemini in Firebase Android generative-ai-android Not actively maintained Use Gemini in Firebase Note for Java developers: There was no legacy Google-provided Java SDK for the Gemini API, so no migration from a previous Google library is required. You can start directly with the new library in the Language support and installation section. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-09 UTC.
|
text_content/docs_embeddings_73f381c0.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/embeddings
|
| 2 |
+
Title: Embeddings | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Embeddings | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Embeddings Note: Introducing our first Gemini embedding model, available now to developers as gemini-embedding-exp-03-07 in the API. The Gemini API supports several embedding models that generate embeddings for words, phrases, code, and sentences. The resulting embeddings can then be used for tasks such as semantic search, text classification, and clustering, among many others. What are embeddings? Embeddings are numerical representations of text (or other media formats) that capture relationships between inputs. Text embeddings work by converting text into arrays of floating point numbers, called vectors . These vectors are designed to capture the meaning of the text. The length of the embedding array is called the vector's dimensionality . A passage of text might be represented by a vector containing hundreds of dimensions. Embeddings capture semantic meaning and context, which results in text with similar meanings having "closer" embeddings. For example, the sentence "I took my dog to the vet" and "I took my cat to the vet" would have embeddings that are close to each other in the vector space. You can use embeddings to compare different texts and understand how they relate. For example, if the embeddings of the text "cat" and "dog" are close together you can infer that these words are similar in meaning, context, or both. This enables a variety of common AI use cases . Before you begin Before calling the Gemini API, ensure you have your SDK of choice installed, and a Gemini API key configured and ready to use. Generate embeddings Use the embedContent method to generate text embeddings: Python from google import genai client = genai . Client () result = client . models . embed_content ( model = "gemini-embedding-exp-03-07" , contents = "What is the meaning of life?" ) print ( result . embeddings ) JavaScript import { GoogleGenAI } from "@google/genai" ; async function main () { const ai = new GoogleGenAI ({}); const response = await ai . models . embedContent ({ model : 'gemini-embedding-exp-03-07' , contents : 'What is the meaning of life?' , }); console . log ( response . embeddings ); } main (); Go package main import ( "context" "encoding/json" "fmt" "log" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } contents := [] * genai . Content { genai . NewContentFromText ( "What is the meaning of life?" , genai . RoleUser ), } result , err := client . Models . EmbedContent ( ctx , "gemini-embedding-exp-03-07" , contents , nil , ) if err != nil { log . Fatal ( err ) } embeddings , err := json . MarshalIndent ( result . Embeddings , "" , " " ) if err != nil { log . Fatal ( err ) } fmt . Println ( string ( embeddings )) } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-exp-03-07:embedContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d '{"model": "models/gemini-embedding-exp-03-07", "content": { "parts":[{ "text": "What is the meaning of life?"}]} }' You can also generate embeddings for multiple chunks at once by passing them in as a list of strings. Task types When building Retrieval Augmented Generation (RAG) systems, a common design is to use text embeddings to perform a similarity search. In some cases this can lead to degraded quality, because questions and their answers are not semantically similar. For example, a question like "Why is the sky blue?" and its answer "The scattering of sunlight causes the blue color," have distinctly different meanings as statements, which means that a RAG system won't automatically recognize their relation. Task types enable you to generate optimized embeddings for specific tasks, saving you time and cost and improving performance. Python from google import genai from google.genai import types client = genai . Client () result = client . models . embed_content ( model = "gemini-embedding-exp-03-07" , contents = "What is the meaning of life?" , config = types . EmbedContentConfig ( task_type = "SEMANTIC_SIMILARITY" ) ) print ( result . embeddings ) JavaScript import { GoogleGenAI } from "@google/genai" ; async function main () { const ai = new GoogleGenAI ({}); const response = await ai . models . embedContent ({ model : 'gemini-embedding-exp-03-07' , contents : 'What is the meaning of life?' , config : { taskType : "SEMANTIC_SIMILARITY" , } }); console . log ( response . embeddings ); } main (); REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-exp-03-07:embedContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d '{"model": "models/gemini-embedding-exp-03-07", "content": { "parts":[{ "text": "What is the meaning of life?"}]}, "taskType": "SEMANTIC_SIMILARITY" }' Supported task types Task type Description SEMANTIC_SIMILARITY Used to generate embeddings that are optimized to assess text similarity. CLASSIFICATION Used to generate embeddings that are optimized to classify texts according to preset labels. CLUSTERING Used to generate embeddings that are optimized to cluster texts based on their similarities. RETRIEVAL_DOCUMENT , RETRIEVAL_QUERY , QUESTION_ANSWERING , and FACT_VERIFICATION Used to generate embeddings that are optimized for document search or information retrieval. CODE_RETRIEVAL_QUERY Used to retrieve a code block based on a natural language query, such as sort an array or reverse a linked list. Embeddings of the code blocks are computed using RETRIEVAL_DOCUMENT . Use cases Text embeddings are used in a variety of common AI use cases, such as: Information retrieval: You can use embeddings to retrieve semantically similar text given a piece of input text. Document search tutorial task Clustering: Comparing groups of embeddings can help identify hidden trends. Embedding clustering tutorial bubble_chart Vector database: As you take different embedding use cases to production, it is common to store embeddings in a vector database. Vector database tutorial bolt Classification: You can train a model using embeddings to classify documents into categories. Classification tutorial token Embedding models The Gemini API offers three models that generate text embeddings: gemini-embedding-exp-03-07 text-embedding-004 embedding-001 What's next Check out the embeddings quickstart notebook . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-08 UTC.
|
text_content/docs_ephemeral-tokens_d7b3db56.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/ephemeral-tokens
|
| 2 |
+
Title: Ephemeral tokens | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Ephemeral tokens | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Ephemeral tokens Ephemeral tokens are short-lived authentication tokens for accessing the Gemini API through WebSockets . They are designed to enhance security when you are connecting directly from a user's device to the API (a client-to-server implementation). Like standard API keys, ephemeral tokens can be extracted from client-side applications such as web browsers or mobile apps. But because ephemeral tokens expire quickly and can be restricted, they significantly reduce the security risks in a production environment. Note: Ephemeral tokens are only compatible with Live API at this time. You should use them when accessing the Live API directly from client-side applications to enhance API key security. How ephemeral tokens work Here's how ephemeral tokens work at a high level: Your client (e.g. web app) authenticates with your backend. Your backend requests an ephemeral token from Gemini API's provisioning service. Gemini API issues a short-lived token. Your backend sends the token to the client for WebSocket connections to Live API. You can do this by swapping your API key with an ephemeral token. The client then uses the token as if it were an API key. This enhances security because even if extracted, the token is short-lived, unlike a long-lived API key deployed client-side. Since the client sends data directly to Gemini, this also improves latency and avoids your backends needing to proxy the real time data. Create an ephemeral token Here is a simplified example of how to get an ephemeral token from Gemini. By default, you'll have 1 minute to start new Live API sessions using the token from this request ( newSessionExpireTime ), and 30 minutes to send messages over that connection ( expireTime ). Python import datetime now = datetime . datetime . now ( tz = datetime . timezone . utc ) client = genai . Client ( http_options = { 'api_version' : 'v1alpha' ,} ) token = client . auth_tokens . create ( config = { 'uses' : 1 , # The ephemeral token can only be used to start a single session 'expire_time' : now + datetime . timedelta ( minutes = 30 ), # Default is 30 minutes in the future # 'expire_time': '2025-05-17T00:00:00Z', # Accepts isoformat. 'new_session_expire_time' : now + datetime . timedelta ( minutes = 1 ), # Default 1 minute in the future 'http_options' : { 'api_version' : 'v1alpha' }, } ) # You'll need to pass the value under token.name back to your client to use it JavaScript import { GoogleGenAI } from "@google/genai" ; const client = new GoogleGenAI ({}); const expireTime = new Date ( Date . now () + 30 * 60 * 1000 ). toISOString (); const token : AuthToken = await client . authTokens . create ({ config : { uses : 1 , // The default expireTime : expireTime // Default is 30 mins newSessionExpireTime : new Date ( Date . now () + ( 1 * 60 * 1000 )), // Default 1 minute in the future httpOptions : { apiVersion : 'v1alpha' }, }, }); For expireTime value constraints, defaults, and other field specs, see the API reference . Within the expireTime timeframe, you'll need sessionResumption to reconnect the call every 10 minutes (this can be done with the same token even if uses: 1 ). It's also possible to lock an ephemeral token to a set of configurations. This might be useful to further improve security of your application and keep your system instructions on the server side. Python client = genai . Client ( http_options = { 'api_version' : 'v1alpha' ,} ) token = client . auth_tokens . create ( config = { 'uses' : 1 , 'live_connect_constraints' : { 'model' : 'gemini-2.0-flash-live-001' , 'config' : { 'session_resumption' :{}, 'temperature' : 0.7 , 'response_modalities' :[ 'TEXT' ] } }, 'http_options' : { 'api_version' : 'v1alpha' }, } ) # You'll need to pass the value under token.name back to your client to use it JavaScript import { GoogleGenAI } from "@google/genai" ; const client = new GoogleGenAI ({}); const expireTime = new Date ( Date . now () + 30 * 60 * 1000 ). toISOString (); const token = await client . authTokens . create ({ config : { uses : 1 , // The default expireTime : expireTime , liveConnectConstraints : { model : 'gemini-2.0-flash-live-001' , config : { sessionResumption : {}, temperature : 0.7 , responseModalities : [ 'TEXT' ] } }, httpOptions : { apiVersion : 'v1alpha' } } }); // You'll need to pass the value under token.name back to your client to use it You can also lock a subset of fields, see the SDK documentation for more info. Connect to Live API with an ephemeral token Once you have an ephemeral token, you use it as if it were an API key (but remember, it only works for the live API, and only with the v1alpha version of the API). Note that use of ephemeral tokens only adds value when deploying applications that follow client-to-server implementation approach. JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; // Use the token generated in the "Create an ephemeral token" section here const ai = new GoogleGenAI ({ apiKey : token . name }); const model = 'gemini-2.0-flash-live-001' ; const config = { responseModalities : [ Modality . TEXT ] }; async function main () { const session = await ai . live . connect ({ model : model , config : config , callbacks : { ... }, }); // Send content... session . close (); } main (); Note: If not using the SDK, note that ephemeral tokens must either be passed in an access_token query parameter, or in an HTTP Authorization prefixed by the auth-scheme Token . See Get started with Live API for more examples. Best practices Set a short expiration duration using the expire_time parameter. Tokens expire, requiring re-initiation of the provisioning process. Verify secure authentication for your own backend. Ephemeral tokens will only be as secure as your backend authentication method. Generally, avoid using ephemeral tokens for backend-to-Gemini connections, as this path is typically considered secure. Limitations Ephemeral tokens are only compatible with Live API at this time. What's next Read the Live API reference on ephemeral tokens for more information. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-10 UTC.
|
text_content/docs_files_2dd3ac4c.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/files#troubleshooting
|
| 2 |
+
Title: Files API | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Files API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Files API The Gemini family of artificial intelligence (AI) models is built to handle various types of input data, including text, images, and audio. Since these models can handle more than one type or mode of data, the Gemini models are called multimodal models or explained as having multimodal capabilities . This guide shows you how to work with media files using the Files API. The basic operations are the same for audio files, images, videos, documents, and other supported file types. For file prompting guidance, check out the File prompt guide section. Upload a file You can use the Files API to upload a media file. Always use the Files API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20 MB. The following code uploads a file and then uses the file in a call to generateContent . Python from google import genai client = genai . Client () myfile = client . files . upload ( file = "path/to/sample.mp3" ) response = client . models . generate_content ( model = "gemini-2.0-flash" , contents = [ "Describe this audio clip" , myfile ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); async function main () { const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const response = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Describe this audio clip" , ]), }); console . log ( response . text ); } await main (); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } defer client . DeleteFile ( ctx , file . Name ) model := client . GenerativeModel ( "gemini-2.0-flash" ) resp , err := model . GenerateContent ( ctx , genai . FileData { URI : file . URI }, genai . Text ( "Describe this audio clip" )) if err != nil { log . Fatal ( err ) } printResponse ( resp ) REST AUDIO_PATH = "path/to/sample.mp3" MIME_TYPE = $( file -b --mime-type " ${ AUDIO_PATH } " ) NUM_BYTES = $( wc -c < " ${ AUDIO_PATH } " ) DISPLAY_NAME = AUDIO tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl " ${ BASE_URL } /upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D " ${ tmp_header_file } " \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ AUDIO_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "Describe this audio clip"}, {"file_data":{"mime_type": "${MIME_TYPE}", "file_uri": ' $file_uri '}}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Get metadata for a file You can verify that the API successfully stored the uploaded file and get its metadata by calling files.get . Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) file_name = myfile . name myfile = client . files . get ( name = file_name ) print ( myfile ) JavaScript const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const fileName = myfile . name ; const fetchedFile = await ai . files . get ({ name : fileName }); console . log ( fetchedFile ); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } gotFile , err := client . GetFile ( ctx , file . Name ) if err != nil { log . Fatal ( err ) } fmt . Println ( "Got file:" , gotFile . Name ) REST # file_info.json was created in the upload example name = $( jq ".file.name" file_info.json ) # Get the file of interest to check state curl https://generativelanguage.googleapis.com/v1beta/files/ $name \ -H "x-goog-api-key: $GEMINI_API_KEY " > file_info.json # Print some information about the file you got name = $( jq ".file.name" file_info.json ) echo name = $name file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri List uploaded files You can upload multiple files using the Files API. The following code gets a list of all the files uploaded: Python print ( 'My files:' ) for f in client . files . list (): print ( ' ' , f . name ) JavaScript const listResponse = await ai . files . list ({ config : { pageSize : 10 } }); for await ( const file of listResponse ) { console . log ( file . name ); } Go iter := client . ListFiles ( ctx ) for { ifile , err := iter . Next () if err == iterator . Done { break } if err != nil { log . Fatal ( err ) } fmt . Println ( ifile . Name ) } REST echo "My files: " curl "https://generativelanguage.googleapis.com/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " Delete uploaded files Files are automatically deleted after 48 hours. You can also manually delete an uploaded file: Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) client . files . delete ( name = myfile . name ) JavaScript const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const fileName = myfile . name ; await ai . files . delete ({ name : fileName }); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } client . DeleteFile ( ctx , file . Name ) REST curl --request "DELETE" https://generativelanguage.googleapis.com/v1beta/files/ $name \ -H "x-goog-api-key: $GEMINI_API_KEY " Usage info You can use the Files API to upload and interact with media files. The Files API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB. Files are stored for 48 hours. During that time, you can use the API to get metadata about the files, but you can't download the files. The Files API is available at no cost in all regions where the Gemini API is available. File prompting strategies This section provides guidance and best practices for using media files with prompts for the Gemini API. Being able to use various types of data in your prompts gives you more flexibility in terms of what tasks you can tackle with the Gemini API. For example, you can send the model a photo of a delicious meal and ask it to write a short blog about the meal. Prompt Response Write a short, engaging blog post based on this picture. It should include a description of the meal in the photo and talk about my journey meal prepping. Meal prepping is a great way to save time and money, and it can also help you to eat healthier. This meal is a great example of a healthy and delicious meal that can be easily prepped ahead of time. This meal features brown rice, roasted vegetables, and chicken teriyaki. The brown rice is a healthy and complex carbohydrate that will give you sustained energy. The roasted vegetables are a great way to get your daily dose of vitamins and minerals, and the chicken teriyaki is a delicious and protein-rich option. This meal is also very easy to prepare. Simply cook the brown rice, roast the vegetables, and cook the chicken teriyaki. Once everything is cooked, divide it into meal prep containers and store them in the refrigerator. You can then grab a container and go on busy mornings! If you're looking for a healthy and delicious meal that can be easily prepped ahead of time, this meal is a great option. It's packed with nutrients and flavor, and it's sure to keep you feeling full and satisfied. Here's to healthy and delicious meal prepping! If you are having trouble getting the output you want from prompts that use media files, there are some strategies that can help you get the results you want. The following sections provide design approaches and troubleshooting tips for improving prompts that use multimodal input. You can improve your multimodal prompts by following these best practices: Prompt design fundamentals Be specific in your instructions : Craft clear and concise instructions that leave minimal room for misinterpretation. Add a few examples to your prompt: Use realistic few-shot examples to illustrate what you want to achieve. Break it down step-by-step : Divide complex tasks into manageable sub-goals, guiding the model through the process. Specify the output format : In your prompt, ask for the output to be in the format you want, like markdown, JSON, HTML and more. Put your image first for single-image prompts : While Gemini can handle image and text inputs in any order, for prompts containing a single image, it might perform better if that image (or video) is placed before the text prompt. However, for prompts that require images to be highly interleaved with texts to make sense, use whatever order is most natural. Troubleshooting your multimodal prompt If the model is not drawing information from the relevant part of the image: Drop hints with which aspects of the image you want the prompt to draw information from. If the model output is too generic (not tailored enough to the image/video input): At the start of the prompt, try asking the model to describe the image(s) or video before providing the task instruction, or try asking the model to refer to what's in the image. To troubleshoot which part failed: Ask the model to describe the image, or ask the model to explain its reasoning, to gauge the model's initial understanding. If your prompt results in hallucinated content: Try dialing down the temperature setting or asking the model for shorter descriptions so that it's less likely to extrapolate additional details. Tuning the sampling parameters: Experiment with different temperature settings and top-k selections to adjust the model's creativity. Be specific in your instructions Prompts have the most success when they are clear and detailed. If you have a specific output in mind, it's better to include that requirement in the prompt to ensure you get the output you want. For this image of an airport board, asking the model to just "describe this image" could generate a general description. If you need the model to parse the time and city from the image, you can include that request directly in your prompt. Prompt Model response Describe this image. The image shows an airport arrivals and departures board. Updated prompt Improved response Parse the time and city from the airport board shown in this image into a list. 10:50 Moscow 11:05 Edinburgh 11:05 London 11:10 Bucharest 11:30 Kiev 11:35 Dublin 11:45 East Midlands 12:15 Sofia 12:30 London 12:30 Newcastle 12:40 St Petersburg 12:40 London 12:45 Manchester Add a few examples The Gemini model can accept multiple inputs which it can use as examples to understand the output you want. Adding these examples can help the model identify the patterns and apply the relationship between the given images and responses to the new example. This is also called "few-shot" learning. In the example below, the initial output is written in sentence form, and also contains the country (Brazil). Suppose you want the output in a different format or style, and you want only the city, not the country. Adding few-shot examples to your prompt can steer the model to respond in the way you want. Prompt Model response Determine the city along with the landmark. The landmark is the Christ the Redeemer statue in Rio de Janeiro, Brazil. Updated prompt Improved response Determine the city along with the landmark. city: Rome, landmark: the Colosseum. city: Beijing, landmark: Forbidden City city: Rio de Janeiro, landmark: Christ the Redeemer statue Break it down step-by-step For complex tasks like the ones that require both visual understanding and reasoning, it can be helpful to split the task into smaller, more straightforward steps. Alternatively, it could also be effective if you directly ask the model to “think step by step” in your prompt. Prompt Model response When will I run out of toilet paper? Soon, you only have 3 rolls left. Updated prompt Improved response 1. First, count how many toilet paper rolls are in this picture. 2. Then, determine how much toilet paper a typical person uses per day. 3. Calculate how long these rolls of toilet paper will last. 1. There are 3 rolls of toilet paper in this picture. 2. A typical person uses about 20 sheets of toilet paper per day. 3. If each roll contains 200 sheets, then each roll will last for about 10 days. Therefore, the 3 rolls will last for about a month. Math problems or other types of word problems are great candidates for asking the model to think step-by-step. Prompt Response What is the 4th term in the sequence? -135 The response from the model is incorrect. Some ways to improve this is to ask the model to parse the problem before answering the prompt. Prompt Response Parse the formula in the image first. Then based on the formula, answer "what's the 4th term in the sequence?". Click to expand the result $$ b(n) = b(n-1) * (-3) $$ $$ b(1) = 15 $$ Substitute n = 4 into the formula: $$ b(4) = b(4-1) * (-3) $$ $$ b(4) = b(3) * (-3) $$ We don't know the value of b(3), so substitute n = 3 into the formula: $$ b(3) = b(3-1) * (-3) $$ $$ b(3) = b(2) * (-3) $$ We don't know the value of b(2), so substitute n = 2 into the formula: $$ b(2) = b(2-1) * (-3) $$ $$ b(2) = b(1) * (-3) $$ We know the value of b(1) is 15, so substitute it into the formula: $$ b(2) = 15 * (-3) $$ $$ b(2) = -45 $$ Now that we know the value of b(2), substitute it into the formula for b(3): $$ b(3) = b(2) * (-3) $$ $$ b(3) = -45 * (-3) $$ $$ b(3) = 135 $$ Now that we know the value of b(3), substitute it into the formula for b(4): $$ b(4) = b(3) * (-3) $$ $$ b(4) = 135 * (-3) $$ $$ b(4) = -405 $$ Therefore, the 4th term in the sequence is -405. This model response is correct - now to make the response even more streamlined, you can ask the model to to think step-by-step. Updated prompt Improved response What is the 4th term in the sequence? Think step by step. The first term is 15. The second term is b(1) * (-3) = 15 * (-3) = -45. The third term is b(2) * (-3) = -45 * (-3) = 135. The fourth term is b(3) * (-3) = 135 * (-3) = -405. Specify the output format A common problem is the need for model output to be in a certain format (e.g. markdown, JSON, HTML), especially if the model output needs to be ingested by a downstream task. You can try to produce output in that format by instructing the model to do so within the prompt. Below are two examples: Prompt Model response Parse the table in this image into markdown format | Subject | Status | Time | Start date | Due on | | ------- | ------ | ---- | ---------- | ------ | | Math | Done | 1.5 hrs | 11/29/2023 | 12/1/2023 | | Geography | Done | 30 min | 12/2/2023 | 12/4/2023 | | Physics | In progress | 2 hrs | 12/3/2023 | 12/5/2023 | | Biology | Not started | 6 hrs | 11/2/2016 | 12/6/2023 | Prompt Model response Provide a list of all the following attributes: ingredients, type of cuisine, vegetarian or not, in JSON format ```json { "ingredients": [ "rice", "shrimp", "clams", "mussels", "peas", "tomatoes", "onions", "garlic", "olive oil", "paprika", "salt", "pepper" ], "type of cuisine": "Spanish", "vegetarian": false } ``` Put your image first for single-image prompts While Gemini can interpret image and text in any order within a prompt, placing a single image before the text prompt might lead to better results. In the following samples, you'll notice the image comes first before the text in each prompt. Troubleshooting your multimodal prompt You might need to troubleshoot your prompt if you are not getting a helpful response. Here are a few strategies you could try. If the model is not drawing information from the relevant part of the image To get a more specific response, you can point out which aspects of the image you want the prompt to draw information from to inform its response. Prompt Model response How many days will these diapers last a baby? The diapers will last for 198 days before they run out. Updated prompt Improved response How long will these diapers last before I run out? Use the weight shown on the box to determine the child's age, and use the total number of diapers in the box. Divide the total number by how many diapers the child goes through per day. The box contains 198 diapers. The box states the diapers are for a child weighing 22-37 lbs. The average child in this weight range is likely between 1 and 2 years old. A 2-year-old typically goes through 6 diapers per day, therefore, the diapers will last around 33 days (198/6). If the model output is too generic and not tailored enough to the image input To help the model tailor its response to the image(s), try asking it to describe the images before performing its reasoning task. Prompt Model response What is in common between these images? All four images are examples of landscape photography. Updated prompt Improved response First, describe what's in each image in detail. What's in common between these images? The first image shows a person packing a suitcase. The second image shows an empty hallway in a shopping mall. The third image shows a roll of toilet paper. The fourth image shows a dark and stormy sea. The commonality between these images is that they are all examples of liminal spaces. Liminal spaces are places or situations that are in between two other places or situations. They are often characterized by a sense of uncertainty or ambiguity. The first image shows the person in between being at home and being on a trip. The second image shows the hallway in between two stores. The third image is of toilet paper, which can represent a bathroom. A bathroom is a transitional space between private and public realms. It's a place where we prepare ourselves to enter or leave the outside world. The fourth image shows the sea in between being calm and being stormy. Liminal spaces can be both exciting and unsettling. They can be a place of new beginning or a place of endings. They can be a place of reflection or a place of action. You can also try explicitly asking the model to refer to what's in the image. Updated prompt Improved response What is in common between these images? Refer to what's in the images in your response. All of the images are related to travel. The first image shows a person packing a suitcase, the second image shows an empty hallway in a shopping mall, the third image shows a stack of toilet paper, and the fourth image shows a dark and stormy sea. These images are all related to travel because they are all things that people need or use when they travel. The first image shows a person packing a suitcase, which is something that people do when they are going on a trip. The second image shows an empty hallway in a shopping mall, which is a place where people often go to buy things for their trips. The third image shows a stack of toilet paper, which is something that people often use when they are traveling. The fourth image shows a dark and stormy sea, which is something that people often encounter when they are traveling. The same tips work for creative output from images. In the following example with the rabbit, explicitly asking the model to include aspects from the image can improve the output. Prompt Model response Can you write me a poem about this image? Little bunny, so soft and brown, Hopping through the field, up and down. Your ears so long, your tail so small, You're the cutest creature of them all. You nibble on the grass, so green and sweet, And hop away when you see me. But I'll always be your friend, Little bunny, until the very end. Updated prompt Improved response Can you write me a descriptive and dramatic poem about this image and include the location? Little bunny on the green, Ears so long, so keen. What do you see out there, In the distance, where you stare? Is it the ocean blue, Sparkling in the sun's hue? Or the waves that crash and roar, A rhythm that you adore? Little bunny, tell me true, What captures your attention so? Is it the salty breeze, Or the seagulls' cries that please? Perhaps it's the vastness of it all, A world so big and small. A place where dreams can roam, And adventures can be sown. Little bunny, I wonder what you dream, As you sit on the grass, so serene. Do you yearn to explore the deep, Or stay on land, where you can leap? Whatever it may be, little bunny, Keep that spark of wonder burning brightly. For in your dreams and aspirations, Lies a world waiting for your creation. Troubleshooting which part of the prompt failed It can be hard to know whether a prompt failed because the model didn't understand the image to begin with, or if it did understand the image but did not perform the correct reasoning steps afterward. To disambiguate between those reasons, ask the model to describe what's in the image. In the following example, if the model responds with a snack that seems surprising when paired with tea (e.g. popcorn), you can first troubleshoot to determine whether the model correctly recognized that the image contains tea. Prompt Prompt for troubleshooting What's a snack I can make in 1 minute that would go well with this? Describe what's in this image. Another strategy is to ask the model to explain its reasoning. That can help you narrow down which part of the reasoning broke down, if any. Prompt Prompt for troubleshooting What's a snack I can make in 1 minute that would go well with this? What's a snack I can make in 1 minute that would go well with this? Please explain why. What's next Try writing your own multimodal prompts using Google AI Studio . For information on using the Gemini Files API for uploading media files and including them in your prompts, see the Vision , Audio , and Document processing guides. For more guidance on prompt design, like tuning sampling parameters, see the Prompt strategies page. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
|
text_content/docs_files_7ee302e6.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/files#prompt-guide
|
| 2 |
+
Title: Files API | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Files API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Files API The Gemini family of artificial intelligence (AI) models is built to handle various types of input data, including text, images, and audio. Since these models can handle more than one type or mode of data, the Gemini models are called multimodal models or explained as having multimodal capabilities . This guide shows you how to work with media files using the Files API. The basic operations are the same for audio files, images, videos, documents, and other supported file types. For file prompting guidance, check out the File prompt guide section. Upload a file You can use the Files API to upload a media file. Always use the Files API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20 MB. The following code uploads a file and then uses the file in a call to generateContent . Python from google import genai client = genai . Client () myfile = client . files . upload ( file = "path/to/sample.mp3" ) response = client . models . generate_content ( model = "gemini-2.0-flash" , contents = [ "Describe this audio clip" , myfile ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); async function main () { const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const response = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Describe this audio clip" , ]), }); console . log ( response . text ); } await main (); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } defer client . DeleteFile ( ctx , file . Name ) model := client . GenerativeModel ( "gemini-2.0-flash" ) resp , err := model . GenerateContent ( ctx , genai . FileData { URI : file . URI }, genai . Text ( "Describe this audio clip" )) if err != nil { log . Fatal ( err ) } printResponse ( resp ) REST AUDIO_PATH = "path/to/sample.mp3" MIME_TYPE = $( file -b --mime-type " ${ AUDIO_PATH } " ) NUM_BYTES = $( wc -c < " ${ AUDIO_PATH } " ) DISPLAY_NAME = AUDIO tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl " ${ BASE_URL } /upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D " ${ tmp_header_file } " \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ AUDIO_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "Describe this audio clip"}, {"file_data":{"mime_type": "${MIME_TYPE}", "file_uri": ' $file_uri '}}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Get metadata for a file You can verify that the API successfully stored the uploaded file and get its metadata by calling files.get . Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) file_name = myfile . name myfile = client . files . get ( name = file_name ) print ( myfile ) JavaScript const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const fileName = myfile . name ; const fetchedFile = await ai . files . get ({ name : fileName }); console . log ( fetchedFile ); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } gotFile , err := client . GetFile ( ctx , file . Name ) if err != nil { log . Fatal ( err ) } fmt . Println ( "Got file:" , gotFile . Name ) REST # file_info.json was created in the upload example name = $( jq ".file.name" file_info.json ) # Get the file of interest to check state curl https://generativelanguage.googleapis.com/v1beta/files/ $name \ -H "x-goog-api-key: $GEMINI_API_KEY " > file_info.json # Print some information about the file you got name = $( jq ".file.name" file_info.json ) echo name = $name file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri List uploaded files You can upload multiple files using the Files API. The following code gets a list of all the files uploaded: Python print ( 'My files:' ) for f in client . files . list (): print ( ' ' , f . name ) JavaScript const listResponse = await ai . files . list ({ config : { pageSize : 10 } }); for await ( const file of listResponse ) { console . log ( file . name ); } Go iter := client . ListFiles ( ctx ) for { ifile , err := iter . Next () if err == iterator . Done { break } if err != nil { log . Fatal ( err ) } fmt . Println ( ifile . Name ) } REST echo "My files: " curl "https://generativelanguage.googleapis.com/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " Delete uploaded files Files are automatically deleted after 48 hours. You can also manually delete an uploaded file: Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) client . files . delete ( name = myfile . name ) JavaScript const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const fileName = myfile . name ; await ai . files . delete ({ name : fileName }); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } client . DeleteFile ( ctx , file . Name ) REST curl --request "DELETE" https://generativelanguage.googleapis.com/v1beta/files/ $name \ -H "x-goog-api-key: $GEMINI_API_KEY " Usage info You can use the Files API to upload and interact with media files. The Files API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB. Files are stored for 48 hours. During that time, you can use the API to get metadata about the files, but you can't download the files. The Files API is available at no cost in all regions where the Gemini API is available. File prompting strategies This section provides guidance and best practices for using media files with prompts for the Gemini API. Being able to use various types of data in your prompts gives you more flexibility in terms of what tasks you can tackle with the Gemini API. For example, you can send the model a photo of a delicious meal and ask it to write a short blog about the meal. Prompt Response Write a short, engaging blog post based on this picture. It should include a description of the meal in the photo and talk about my journey meal prepping. Meal prepping is a great way to save time and money, and it can also help you to eat healthier. This meal is a great example of a healthy and delicious meal that can be easily prepped ahead of time. This meal features brown rice, roasted vegetables, and chicken teriyaki. The brown rice is a healthy and complex carbohydrate that will give you sustained energy. The roasted vegetables are a great way to get your daily dose of vitamins and minerals, and the chicken teriyaki is a delicious and protein-rich option. This meal is also very easy to prepare. Simply cook the brown rice, roast the vegetables, and cook the chicken teriyaki. Once everything is cooked, divide it into meal prep containers and store them in the refrigerator. You can then grab a container and go on busy mornings! If you're looking for a healthy and delicious meal that can be easily prepped ahead of time, this meal is a great option. It's packed with nutrients and flavor, and it's sure to keep you feeling full and satisfied. Here's to healthy and delicious meal prepping! If you are having trouble getting the output you want from prompts that use media files, there are some strategies that can help you get the results you want. The following sections provide design approaches and troubleshooting tips for improving prompts that use multimodal input. You can improve your multimodal prompts by following these best practices: Prompt design fundamentals Be specific in your instructions : Craft clear and concise instructions that leave minimal room for misinterpretation. Add a few examples to your prompt: Use realistic few-shot examples to illustrate what you want to achieve. Break it down step-by-step : Divide complex tasks into manageable sub-goals, guiding the model through the process. Specify the output format : In your prompt, ask for the output to be in the format you want, like markdown, JSON, HTML and more. Put your image first for single-image prompts : While Gemini can handle image and text inputs in any order, for prompts containing a single image, it might perform better if that image (or video) is placed before the text prompt. However, for prompts that require images to be highly interleaved with texts to make sense, use whatever order is most natural. Troubleshooting your multimodal prompt If the model is not drawing information from the relevant part of the image: Drop hints with which aspects of the image you want the prompt to draw information from. If the model output is too generic (not tailored enough to the image/video input): At the start of the prompt, try asking the model to describe the image(s) or video before providing the task instruction, or try asking the model to refer to what's in the image. To troubleshoot which part failed: Ask the model to describe the image, or ask the model to explain its reasoning, to gauge the model's initial understanding. If your prompt results in hallucinated content: Try dialing down the temperature setting or asking the model for shorter descriptions so that it's less likely to extrapolate additional details. Tuning the sampling parameters: Experiment with different temperature settings and top-k selections to adjust the model's creativity. Be specific in your instructions Prompts have the most success when they are clear and detailed. If you have a specific output in mind, it's better to include that requirement in the prompt to ensure you get the output you want. For this image of an airport board, asking the model to just "describe this image" could generate a general description. If you need the model to parse the time and city from the image, you can include that request directly in your prompt. Prompt Model response Describe this image. The image shows an airport arrivals and departures board. Updated prompt Improved response Parse the time and city from the airport board shown in this image into a list. 10:50 Moscow 11:05 Edinburgh 11:05 London 11:10 Bucharest 11:30 Kiev 11:35 Dublin 11:45 East Midlands 12:15 Sofia 12:30 London 12:30 Newcastle 12:40 St Petersburg 12:40 London 12:45 Manchester Add a few examples The Gemini model can accept multiple inputs which it can use as examples to understand the output you want. Adding these examples can help the model identify the patterns and apply the relationship between the given images and responses to the new example. This is also called "few-shot" learning. In the example below, the initial output is written in sentence form, and also contains the country (Brazil). Suppose you want the output in a different format or style, and you want only the city, not the country. Adding few-shot examples to your prompt can steer the model to respond in the way you want. Prompt Model response Determine the city along with the landmark. The landmark is the Christ the Redeemer statue in Rio de Janeiro, Brazil. Updated prompt Improved response Determine the city along with the landmark. city: Rome, landmark: the Colosseum. city: Beijing, landmark: Forbidden City city: Rio de Janeiro, landmark: Christ the Redeemer statue Break it down step-by-step For complex tasks like the ones that require both visual understanding and reasoning, it can be helpful to split the task into smaller, more straightforward steps. Alternatively, it could also be effective if you directly ask the model to “think step by step” in your prompt. Prompt Model response When will I run out of toilet paper? Soon, you only have 3 rolls left. Updated prompt Improved response 1. First, count how many toilet paper rolls are in this picture. 2. Then, determine how much toilet paper a typical person uses per day. 3. Calculate how long these rolls of toilet paper will last. 1. There are 3 rolls of toilet paper in this picture. 2. A typical person uses about 20 sheets of toilet paper per day. 3. If each roll contains 200 sheets, then each roll will last for about 10 days. Therefore, the 3 rolls will last for about a month. Math problems or other types of word problems are great candidates for asking the model to think step-by-step. Prompt Response What is the 4th term in the sequence? -135 The response from the model is incorrect. Some ways to improve this is to ask the model to parse the problem before answering the prompt. Prompt Response Parse the formula in the image first. Then based on the formula, answer "what's the 4th term in the sequence?". Click to expand the result $$ b(n) = b(n-1) * (-3) $$ $$ b(1) = 15 $$ Substitute n = 4 into the formula: $$ b(4) = b(4-1) * (-3) $$ $$ b(4) = b(3) * (-3) $$ We don't know the value of b(3), so substitute n = 3 into the formula: $$ b(3) = b(3-1) * (-3) $$ $$ b(3) = b(2) * (-3) $$ We don't know the value of b(2), so substitute n = 2 into the formula: $$ b(2) = b(2-1) * (-3) $$ $$ b(2) = b(1) * (-3) $$ We know the value of b(1) is 15, so substitute it into the formula: $$ b(2) = 15 * (-3) $$ $$ b(2) = -45 $$ Now that we know the value of b(2), substitute it into the formula for b(3): $$ b(3) = b(2) * (-3) $$ $$ b(3) = -45 * (-3) $$ $$ b(3) = 135 $$ Now that we know the value of b(3), substitute it into the formula for b(4): $$ b(4) = b(3) * (-3) $$ $$ b(4) = 135 * (-3) $$ $$ b(4) = -405 $$ Therefore, the 4th term in the sequence is -405. This model response is correct - now to make the response even more streamlined, you can ask the model to to think step-by-step. Updated prompt Improved response What is the 4th term in the sequence? Think step by step. The first term is 15. The second term is b(1) * (-3) = 15 * (-3) = -45. The third term is b(2) * (-3) = -45 * (-3) = 135. The fourth term is b(3) * (-3) = 135 * (-3) = -405. Specify the output format A common problem is the need for model output to be in a certain format (e.g. markdown, JSON, HTML), especially if the model output needs to be ingested by a downstream task. You can try to produce output in that format by instructing the model to do so within the prompt. Below are two examples: Prompt Model response Parse the table in this image into markdown format | Subject | Status | Time | Start date | Due on | | ------- | ------ | ---- | ---------- | ------ | | Math | Done | 1.5 hrs | 11/29/2023 | 12/1/2023 | | Geography | Done | 30 min | 12/2/2023 | 12/4/2023 | | Physics | In progress | 2 hrs | 12/3/2023 | 12/5/2023 | | Biology | Not started | 6 hrs | 11/2/2016 | 12/6/2023 | Prompt Model response Provide a list of all the following attributes: ingredients, type of cuisine, vegetarian or not, in JSON format ```json { "ingredients": [ "rice", "shrimp", "clams", "mussels", "peas", "tomatoes", "onions", "garlic", "olive oil", "paprika", "salt", "pepper" ], "type of cuisine": "Spanish", "vegetarian": false } ``` Put your image first for single-image prompts While Gemini can interpret image and text in any order within a prompt, placing a single image before the text prompt might lead to better results. In the following samples, you'll notice the image comes first before the text in each prompt. Troubleshooting your multimodal prompt You might need to troubleshoot your prompt if you are not getting a helpful response. Here are a few strategies you could try. If the model is not drawing information from the relevant part of the image To get a more specific response, you can point out which aspects of the image you want the prompt to draw information from to inform its response. Prompt Model response How many days will these diapers last a baby? The diapers will last for 198 days before they run out. Updated prompt Improved response How long will these diapers last before I run out? Use the weight shown on the box to determine the child's age, and use the total number of diapers in the box. Divide the total number by how many diapers the child goes through per day. The box contains 198 diapers. The box states the diapers are for a child weighing 22-37 lbs. The average child in this weight range is likely between 1 and 2 years old. A 2-year-old typically goes through 6 diapers per day, therefore, the diapers will last around 33 days (198/6). If the model output is too generic and not tailored enough to the image input To help the model tailor its response to the image(s), try asking it to describe the images before performing its reasoning task. Prompt Model response What is in common between these images? All four images are examples of landscape photography. Updated prompt Improved response First, describe what's in each image in detail. What's in common between these images? The first image shows a person packing a suitcase. The second image shows an empty hallway in a shopping mall. The third image shows a roll of toilet paper. The fourth image shows a dark and stormy sea. The commonality between these images is that they are all examples of liminal spaces. Liminal spaces are places or situations that are in between two other places or situations. They are often characterized by a sense of uncertainty or ambiguity. The first image shows the person in between being at home and being on a trip. The second image shows the hallway in between two stores. The third image is of toilet paper, which can represent a bathroom. A bathroom is a transitional space between private and public realms. It's a place where we prepare ourselves to enter or leave the outside world. The fourth image shows the sea in between being calm and being stormy. Liminal spaces can be both exciting and unsettling. They can be a place of new beginning or a place of endings. They can be a place of reflection or a place of action. You can also try explicitly asking the model to refer to what's in the image. Updated prompt Improved response What is in common between these images? Refer to what's in the images in your response. All of the images are related to travel. The first image shows a person packing a suitcase, the second image shows an empty hallway in a shopping mall, the third image shows a stack of toilet paper, and the fourth image shows a dark and stormy sea. These images are all related to travel because they are all things that people need or use when they travel. The first image shows a person packing a suitcase, which is something that people do when they are going on a trip. The second image shows an empty hallway in a shopping mall, which is a place where people often go to buy things for their trips. The third image shows a stack of toilet paper, which is something that people often use when they are traveling. The fourth image shows a dark and stormy sea, which is something that people often encounter when they are traveling. The same tips work for creative output from images. In the following example with the rabbit, explicitly asking the model to include aspects from the image can improve the output. Prompt Model response Can you write me a poem about this image? Little bunny, so soft and brown, Hopping through the field, up and down. Your ears so long, your tail so small, You're the cutest creature of them all. You nibble on the grass, so green and sweet, And hop away when you see me. But I'll always be your friend, Little bunny, until the very end. Updated prompt Improved response Can you write me a descriptive and dramatic poem about this image and include the location? Little bunny on the green, Ears so long, so keen. What do you see out there, In the distance, where you stare? Is it the ocean blue, Sparkling in the sun's hue? Or the waves that crash and roar, A rhythm that you adore? Little bunny, tell me true, What captures your attention so? Is it the salty breeze, Or the seagulls' cries that please? Perhaps it's the vastness of it all, A world so big and small. A place where dreams can roam, And adventures can be sown. Little bunny, I wonder what you dream, As you sit on the grass, so serene. Do you yearn to explore the deep, Or stay on land, where you can leap? Whatever it may be, little bunny, Keep that spark of wonder burning brightly. For in your dreams and aspirations, Lies a world waiting for your creation. Troubleshooting which part of the prompt failed It can be hard to know whether a prompt failed because the model didn't understand the image to begin with, or if it did understand the image but did not perform the correct reasoning steps afterward. To disambiguate between those reasons, ask the model to describe what's in the image. In the following example, if the model responds with a snack that seems surprising when paired with tea (e.g. popcorn), you can first troubleshoot to determine whether the model correctly recognized that the image contains tea. Prompt Prompt for troubleshooting What's a snack I can make in 1 minute that would go well with this? Describe what's in this image. Another strategy is to ask the model to explain its reasoning. That can help you narrow down which part of the reasoning broke down, if any. Prompt Prompt for troubleshooting What's a snack I can make in 1 minute that would go well with this? What's a snack I can make in 1 minute that would go well with this? Please explain why. What's next Try writing your own multimodal prompts using Google AI Studio . For information on using the Gemini Files API for uploading media files and including them in your prompts, see the Vision , Audio , and Document processing guides. For more guidance on prompt design, like tuning sampling parameters, see the Prompt strategies page. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
|
text_content/docs_function-calling_5268a3c8.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/function-calling?example=weather#main-content
|
| 2 |
+
Title: Function calling with the Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Function calling with the Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Function calling with the Gemini API Function calling lets you connect models to external tools and APIs. Instead of generating text responses, the model determines when to call specific functions and provides the necessary parameters to execute real-world actions. This allows the model to act as a bridge between natural language and real-world actions and data. Function calling has 3 primary use cases: Augment Knowledge: Access information from external sources like databases, APIs, and knowledge bases. Extend Capabilities: Use external tools to perform computations and extend the limitations of the model, such as using a calculator or creating charts. Take Actions: Interact with external systems using APIs, such as scheduling appointments, creating invoices, sending emails, or controlling smart home devices. Get Weather Schedule Meeting Create Chart Python from google import genai from google.genai import types # Define the function declaration for the model weather_function = { "name" : "get_current_temperature" , "description" : "Gets the current temperature for a given location." , "parameters" : { "type" : "object" , "properties" : { "location" : { "type" : "string" , "description" : "The city name, e.g. San Francisco" , }, }, "required" : [ "location" ], }, } # Configure the client and tools client = genai . Client () tools = types . Tool ( function_declarations = [ weather_function ]) config = types . GenerateContentConfig ( tools = [ tools ]) # Send request with function declarations response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "What's the temperature in London?" , config = config , ) # Check for a function call if response . candidates [ 0 ] . content . parts [ 0 ] . function_call : function_call = response . candidates [ 0 ] . content . parts [ 0 ] . function_call print ( f "Function to call: { function_call . name } " ) print ( f "Arguments: { function_call . args } " ) # In a real app, you would call your function here: # result = get_current_temperature(**function_call.args) else : print ( "No function call found in the response." ) print ( response . text ) JavaScript import { GoogleGenAI , Type } from '@google/genai' ; // Configure the client const ai = new GoogleGenAI ({}); // Define the function declaration for the model const weatherFunctionDeclaration = { name : 'get_current_temperature' , description : 'Gets the current temperature for a given location.' , parameters : { type : Type . OBJECT , properties : { location : { type : Type . STRING , description : 'The city name, e.g. San Francisco' , }, }, required : [ 'location' ], }, }; // Send request with function declarations const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : "What's the temperature in London?" , config : { tools : [{ functionDeclarations : [ weatherFunctionDeclaration ] }], }, }); // Check for function calls in the response if ( response . functionCalls && response . functionCalls . length > 0 ) { const functionCall = response . functionCalls [ 0 ]; // Assuming one function call console . log ( `Function to call: ${ functionCall . name } ` ); console . log ( `Arguments: ${ JSON . stringify ( functionCall . args ) } ` ); // In a real app, you would call your actual function here: // const result = await getCurrentTemperature(functionCall.args); } else { console . log ( "No function call found in the response." ); console . log ( response . text ); } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [ { "role": "user", "parts": [ { "text": "What' \' 's the temperature in London?" } ] } ], "tools": [ { "functionDeclarations": [ { "name": "get_current_temperature", "description": "Gets the current temperature for a given location.", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city name, e.g. San Francisco" } }, "required": ["location"] } } ] } ] }' How function calling works Function calling involves a structured interaction between your application, the model, and external functions. Here's a breakdown of the process: Define Function Declaration: Define the function declaration in your application code. Function Declarations describe the function's name, parameters, and purpose to the model. Call LLM with function declarations: Send user prompt along with the function declaration(s) to the model. It analyzes the request and determines if a function call would be helpful. If so, it responds with a structured JSON object. Execute Function Code (Your Responsibility): The Model does not execute the function itself. It's your application's responsibility to process the response and check for Function Call, if Yes : Extract the name and args of the function and execute the corresponding function in your application. No: The model has provided a direct text response to the prompt (this flow is less emphasized in the example but is a possible outcome). Create User friendly response: If a function was executed, capture the result and send it back to the model in a subsequent turn of the conversation. It will use the result to generate a final, user-friendly response that incorporates the information from the function call. This process can be repeated over multiple turns, allowing for complex interactions and workflows. The model also supports calling multiple functions in a single turn ( parallel function calling ) and in sequence ( compositional function calling ). Step 1: Define a function declaration Define a function and its declaration within your application code that allows users to set light values and make an API request. This function could call external services or APIs. Python # Define a function that the model can call to control smart lights set_light_values_declaration = { "name" : "set_light_values" , "description" : "Sets the brightness and color temperature of a light." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "integer" , "description" : "Light level from 0 to 100. Zero is off and 100 is full brightness" , }, "color_temp" : { "type" : "string" , "enum" : [ "daylight" , "cool" , "warm" ], "description" : "Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`." , }, }, "required" : [ "brightness" , "color_temp" ], }, } # This is the actual function that would be called based on the model's suggestion def set_light_values ( brightness : int , color_temp : str ) - > dict [ str , int | str ]: """Set the brightness and color temperature of a room light. (mock API). Args: brightness: Light level from 0 to 100. Zero is off and 100 is full brightness color_temp: Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. Returns: A dictionary containing the set brightness and color temperature. """ return { "brightness" : brightness , "colorTemperature" : color_temp } JavaScript import { Type } from '@google/genai' ; // Define a function that the model can call to control smart lights const setLightValuesFunctionDeclaration = { name : 'set_light_values' , description : 'Sets the brightness and color temperature of a light.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'Light level from 0 to 100. Zero is off and 100 is full brightness' , }, color_temp : { type : Type . STRING , enum : [ 'daylight' , 'cool' , 'warm' ], description : 'Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`.' , }, }, required : [ 'brightness' , 'color_temp' ], }, }; /** * Set the brightness and color temperature of a room light. (mock API) * @param {number} brightness - Light level from 0 to 100. Zero is off and 100 is full brightness * @param {string} color_temp - Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. * @return {Object} A dictionary containing the set brightness and color temperature. */ function setLightValues ( brightness , color_temp ) { return { brightness : brightness , colorTemperature : color_temp }; } Step 2: Call the model with function declarations Once you have defined your function declarations, you can prompt the model to use them. It analyzes the prompt and function declarations and decides whether to respond directly or to call a function. If a function is called, the response object will contain a function call suggestion. Python from google.genai import types # Configure the client and tools client = genai . Client () tools = types . Tool ( function_declarations = [ set_light_values_declaration ]) config = types . GenerateContentConfig ( tools = [ tools ]) # Define user prompt contents = [ types . Content ( role = "user" , parts = [ types . Part ( text = "Turn the lights down to a romantic level" )] ) ] # Send request with function declarations response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = contents config = config , ) print ( response . candidates [ 0 ] . content . parts [ 0 ] . function_call ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Generation config with function declaration const config = { tools : [{ functionDeclarations : [ setLightValuesFunctionDeclaration ] }] }; // Configure the client const ai = new GoogleGenAI ({}); // Define user prompt const contents = [ { role : 'user' , parts : [{ text : 'Turn the lights down to a romantic level' }] } ]; // Send request with function declarations const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( response . functionCalls [ 0 ]); The model then returns a functionCall object in an OpenAPI compatible schema specifying how to call one or more of the declared functions in order to respond to the user's question. Python id = None args = { 'color_temp' : 'warm' , 'brightness' : 25 } name = 'set_light_values' JavaScript { name : 'set_light_values' , args : { brightness : 25 , color_temp : 'warm' } } Step 3: Execute set_light_values function code Extract the function call details from the model's response, parse the arguments , and execute the set_light_values function. Python # Extract tool call details, it may not be in the first part. tool_call = response . candidates [ 0 ] . content . parts [ 0 ] . function_call if tool_call . name == "set_light_values" : result = set_light_values ( ** tool_call . args ) print ( f "Function execution result: { result } " ) JavaScript // Extract tool call details const tool_call = response . functionCalls [ 0 ] let result ; if ( tool_call . name === 'set_light_values' ) { result = setLightValues ( tool_call . args . brightness , tool_call . args . color_temp ); console . log ( `Function execution result: ${ JSON . stringify ( result ) } ` ); } Step 4: Create user friendly response with function result and call the model again Finally, send the result of the function execution back to the model so it can incorporate this information into its final response to the user. Python # Create a function response part function_response_part = types . Part . from_function_response ( name = tool_call . name , response = { "result" : result }, ) # Append function call and result of the function execution to contents contents . append ( response . candidates [ 0 ] . content ) # Append the content from the model's response. contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Create a function response part const function_response_part = { name : tool_call . name , response : { result } } // Append function call and result of the function execution to contents contents . push ( response . candidates [ 0 ]. content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); // Get the final response from the model const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); This completes the function calling flow. The model successfully used the set_light_values function to perform the request action of the user. Function declarations When you implement function calling in a prompt, you create a tools object, which contains one or more function declarations . You define functions using JSON, specifically with a select subset of the OpenAPI schema format. A single function declaration can include the following parameters: name (string): A unique name for the function ( get_weather_forecast , send_email ). Use descriptive names without spaces or special characters (use underscores or camelCase). description (string): A clear and detailed explanation of the function's purpose and capabilities. This is crucial for the model to understand when to use the function. Be specific and provide examples if helpful ("Finds theaters based on location and optionally movie title which is currently playing in theaters."). parameters (object): Defines the input parameters the function expects. type (string): Specifies the overall data type, such as object . properties (object): Lists individual parameters, each with: type (string): The data type of the parameter, such as string , integer , boolean, array . description (string): A description of the parameter's purpose and format. Provide examples and constraints ("The city and state, e.g., 'San Francisco, CA' or a zip code e.g., '95616'."). enum (array, optional): If the parameter values are from a fixed set, use "enum" to list the allowed values instead of just describing them in the description. This improves accuracy ("enum": ["daylight", "cool", "warm"]). required (array): An array of strings listing the parameter names that are mandatory for the function to operate. Function calling with thinking Enabling "thinking" can improve function call performance by allowing the model to reason through a request before suggesting function calls. However, because the Gemini API is stateless, this reasoning context is lost between turns, which can reduce the quality of function calls as they require multiple turn requests. To preserve this context you can use thought signatures. A thought signature is an encrypted representation of the model's internal thought process that you pass back to the model on subsequent turns. To use thought signatures: Receive the signature: When thinking is enabled, the API response will include a thought_signature field containing an encrypted representation of the model's reasoning. Return the signature: When you send the function's execution result back to the server, include the thought_signature you received. This allows the model to restore its previous thinking context and will likely result in better function calling performance. Receiving signatures from the server Signatures are returned in the part after the model's thinking phase, which typically is a text or function call. Here are some examples of what thought signatures look like returned in each type of part, in response to the request "What's the weather in Lake Tahoe?" using the Get Weather example: Text part [{ "candidates" : [ { "content" : { "parts" : [ { "text" : "Here's what the weather in Lake Tahoe is today" , "thoughtSignature" : "ClcBVKhc7ru7KzUI7SrdUoIdAYLm/+i93aHjfIt4xHyAoO/G70tApxnK2ujBhOhC1PrRy1pkQa88fqFvpHNVd1HDjNLO7mkp6/hFwE+SPPEB3fh0hs4oM8MKhgIBVKhc7uIGvrS7i/T4HpfbnYrluFfWNjZ62gewqe4cVdR/Dlh+zbjtYmDD0gPZ+SuBO7vvHQdzsjePRP+2Y5XddX6LEf/cGGgakq8EhVvw/a6IVzUO6XmpHg2Ag1sl8E9+VFH/lC0R0ZuYdFWligtDuYwp5p5q3o59G0TtWeU2MC1y2MJfE9u/KWd313ldka80/X2W/xF2O/4djMp5G2WKcULfve75zeRCy0mc5iS3SB9mTH0cT6x0vtKjeBx50gcg+CQWtJcRuwTVzz54dmvmK9xvnqA8gKGw3DuaM9wfy5hyY7Qg0z3iyyWdP8T/lbjKim8IEQOk7O1vVwP1Ko7oMYH8JgA1CsoBAVSoXO6v4c5RSyd1cn6EIU0pEFQsjW7rYWPuZdOFq/tsGJT9BCfW7KGkPGwlNSq8jTJFvbcJ/DjtndISQYXwiXd2kGa5JfdS2Kh4zOxCxiWtOk+2nCc3+XQk2nonhO+esGJpkDdbbHZSqRgcUtYKq7q28iPFOQvOFyCiZNB7K86Z/6Hnagu2snSlN/BcTMaFGaWpcCClSUo4foRZn3WbNCoM8rcpD7qEJMp4a5baaSxyyeL1ZTGd2HLpFys/oiW6e3oAnhxuIysCwg==" } ] , "role" : "model" } , "index" : 0 } ] , # Remainder of response... Function call part [{ "candidates" : [ { "content" : { "parts" : [ { "functionCall" : { "name" : "getWeather" , "args" : { "city" : "Lake Tahoe" } } , "thoughtSignature" : "CiwBVKhc7nRyTi3HmggPD9iQiRc261f5jwuMdw3H/itDH0emsb9ZVo3Nwx9p6wpsAVSoXO5i8fDV4jBSBLoaWxB5zUdlGY6aIGp+I0oEnwRRSRQ1LOvrDlojEH8JE8HjiKXALdJrvNPiG+HY3GZEO8pZjEZtc3UoBUh7+SVyjK7Xolu7aRYYeUyzrCapoETWypER1jbrJXnFV23hCosBAVSoXO6oIPNJSmbuEDfGafOhuCSHkpr1yjTp35RXYqmCESzRzWf5+nFXLqncqeFo4ohoxbiYQVpVQbOZF81p8o9zg6xeRE7qMeOv+XN7enXGJ4/s3qNFQpfkSMqRdBITN1VpX7jyfEAjvxBNc7PDfDJZmEPY338ZIY5nFFcmzJSWjVrboFt2sMFv+A==" } ] , "role" : "model" } , "finishReason" : "STOP" , "index" : 0 } ] , # Remainder of response... You can confirm that you received a signature and see what a signature looks like using the following code: # Step 2: Call the model with function declarations # ...Generation config, Configure the client, and Define user prompt (No changes) # Send request with declarations (using a thinking model) response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents ) # See thought signatures for part in response . candidates [ 0 ] . content . parts : if part . thought_signature : print ( "Thought signature:" ) print ( part . thought_signature ) Returning signatures back to the server In order to return signatures back: You should return signatures along with their containing parts back to the server You shouldn't merge a part with a signature with another part which also contains a signature. The signature string is not concatenable You shouldn't merge one part with a signature with another part without a signature. This breaks the correct positioning of the thought represented by the signature. The code will remain the same as in Step 4 of the previous section. But in this case (as indicated in the comment below) you will return signatures to the model along with the result of the function execution so the model can incorporate the thoughts into its final response: Python # Step 4: Create user friendly response with function result and call the model again # ...Create a function response part (No change) # Append thought signatures, function call and result of the function execution to contents function_call_content = response . candidates [ 0 ] . content # Append the model's function call message, which includes thought signatures contents . append ( function_call_content ) contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Step 4: Create user friendly response with function result and call the model again // ...Create a function response part (No change) // Append thought signatures, function call and result of the function execution to contents const function_response_content = response . candidates [ 0 ]. content ; contents . push ( function_response_content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); The following shows what a request returning a thought signature may look like: [{ "contents" : [ { "role" : "user" , "parts" : [ { "text" : "what is the weather in Lake Tahoe?" } ] } , { "parts" : [ { "functionCall" : { "name" : "getWeather" , "args" : { "city" : "Lake Tahoe" } } , "thoughtSignature" : "CiIBVKhc7oDPpCaXyJKKssjqr4g3JNOSgJ/M2V+1THC1icsWCmwBVKhc7pBABbZ+zR3e9234WnWWS6GFXmf8IVwpnzjd5KYd7vyJbn/4vTorWBGayj/vbd9JPaZQjxdAIXhoE5mX/MDsQ7M9N/b0qJjHm39tYIBvS4sIWkMDHqTJqXGLzhhKtrTkfbV3RbaJEkQKmwEBVKhc7qVUgC3hfTXZLo9R3AJzUUIx50NKvJTb9B+UU+LBqgg7Nck1x5OpjWVS2R+SsveprIuYOruk2Y0H53J2OJF8qsxTdIq2si8DGW2V7WK8xyoJH5kbqd7drIw1jLb44b6lx4SMyB0VaULuTBki4d+Ljjg1tJTwR0IYMKqDLDZt9mheINsi0ZxcNjfpnDydRXdWbcSwzmK/wgqJAQFUqFzuKgNVElxs3cbO+xebr2IwcOro84nKTisi0tTp9bICPC9fTUhn3L+rvQWA+d3J1Za8at2bakrqiRj7BTh+CVO9fWQMAEQAs3ni0Z2hfaYG92tOD26E4IoZwyYEoWbfNudpH1fr5tEkyqnEGtWIh7H+XoZQ2DXeiOa+br7Zk88SrNE+trJMCogBAVSoXO5e9fBLg7hnbkmKsrzNLnQtLsQm1gNzjcjEC7nJYklYPp0KI2uGBE1PkM8XNsfllAfHVn7LzHcHNlbQ9pJ7QZTSIeG42goS971r5wNZwxaXwCTphClQh826eqJWo6A/28TtAVQWLhTx5ekbP7qb4nh1UblESZ1saxDQAEo4OKPbDzx5BgqKAQFUqFzuVyjNm5i0wN8hTDnKjfpDroEpPPTs531iFy9BOX+xDCdGHy8D+osFpaoBq6TFekQQbz4hIoUR1YEcP4zI80/cNimEeb9IcFxZTTxiNrbhbbcv0969DSMWhB+ZEqIz4vuw4GLe/xcUvqhlChQwFdgIbdOQHSHpatn5uDlktnP/bi26nKuXIwo0AVSoXO7US22OUH7d1f4abNPI0IyAvhqkPp12rbtWLx9vkOtojE8IP+xCfYtIFuZIzRNZqA==" } ] , "role" : "model" } , { "role" : "user" , "parts" : [ { "functionResponse" : { "name" : "getWeather" , "response" : { "response" : { "stringValue" : "Sunny and hot. 90 degrees Fahrenheit" } } } } ] } ] , # Remainder of request... Learn more about limitations and usage of thought signatures, and about thinking models in general, on the Thinking page. Parallel function calling In addition to single turn function calling, you can also call multiple functions at once. Parallel function calling lets you execute multiple functions at once and is used when the functions are not dependent on each other. This is useful in scenarios like gathering data from multiple independent sources, such as retrieving customer details from different databases or checking inventory levels across various warehouses or performing multiple actions such as converting your apartment into a disco. Python power_disco_ball = { "name" : "power_disco_ball" , "description" : "Powers the spinning disco ball." , "parameters" : { "type" : "object" , "properties" : { "power" : { "type" : "boolean" , "description" : "Whether to turn the disco ball on or off." , } }, "required" : [ "power" ], }, } start_music = { "name" : "start_music" , "description" : "Play some music matching the specified parameters." , "parameters" : { "type" : "object" , "properties" : { "energetic" : { "type" : "boolean" , "description" : "Whether the music is energetic or not." , }, "loud" : { "type" : "boolean" , "description" : "Whether the music is loud or not." , }, }, "required" : [ "energetic" , "loud" ], }, } dim_lights = { "name" : "dim_lights" , "description" : "Dim the lights." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "number" , "description" : "The brightness of the lights, 0.0 is off, 1.0 is full." , } }, "required" : [ "brightness" ], }, } JavaScript import { Type } from '@google/genai' ; const powerDiscoBall = { name : 'power_disco_ball' , description : 'Powers the spinning disco ball.' , parameters : { type : Type . OBJECT , properties : { power : { type : Type . BOOLEAN , description : 'Whether to turn the disco ball on or off.' } }, required : [ 'power' ] } }; const startMusic = { name : 'start_music' , description : 'Play some music matching the specified parameters.' , parameters : { type : Type . OBJECT , properties : { energetic : { type : Type . BOOLEAN , description : 'Whether the music is energetic or not.' }, loud : { type : Type . BOOLEAN , description : 'Whether the music is loud or not.' } }, required : [ 'energetic' , 'loud' ] } }; const dimLights = { name : 'dim_lights' , description : 'Dim the lights.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'The brightness of the lights, 0.0 is off, 1.0 is full.' } }, required : [ 'brightness' ] } }; Configure the function calling mode to allow using all of the specified tools. To learn more, you can read about configuring function calling . Python from google import genai from google.genai import types # Configure the client and tools client = genai . Client () house_tools = [ types . Tool ( function_declarations = [ power_disco_ball , start_music , dim_lights ]) ] config = types . GenerateContentConfig ( tools = house_tools , automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ), # Force the model to call 'any' function, instead of chatting. tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = 'ANY' ) ), ) chat = client . chats . create ( model = "gemini-2.5-flash" , config = config ) response = chat . send_message ( "Turn this place into a party!" ) # Print out each of the function calls requested from this single call print ( "Example 1: Forced function calling" ) for fn in response . function_calls : args = ", " . join ( f " { key } = { val } " for key , val in fn . args . items ()) print ( f " { fn . name } ( { args } )" ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Set up function declarations const houseFns = [ powerDiscoBall , startMusic , dimLights ]; const config = { tools : [{ functionDeclarations : houseFns }], // Force the model to call 'any' function, instead of chatting. toolConfig : { functionCallingConfig : { mode : 'any' } } }; // Configure the client const ai = new GoogleGenAI ({}); // Create a chat session const chat = ai . chats . create ({ model : 'gemini-2.5-flash' , config : config }); const response = await chat . sendMessage ({ message : 'Turn this place into a party!' }); // Print out each of the function calls requested from this single call console . log ( "Example 1: Forced function calling" ); for ( const fn of response . functionCalls ) { const args = Object . entries ( fn . args ) . map (([ key , val ]) = > ` ${ key } = ${ val } ` ) . join ( ', ' ); console . log ( ` ${ fn . name } ( ${ args } )` ); } Each of the printed results reflects a single function call that the model has requested. To send the results back, include the responses in the same order as they were requested. The Python SDK supports automatic function calling , which automatically converts Python functions to declarations, handles the function call execution and response cycle for you. Following is an example for the disco use case. Note: Automatic Function Calling is a Python SDK only feature at the moment. Python from google import genai from google.genai import types # Actual function implementations def power_disco_ball_impl ( power : bool ) - > dict : """Powers the spinning disco ball. Args: power: Whether to turn the disco ball on or off. Returns: A status dictionary indicating the current state. """ return { "status" : f "Disco ball powered { 'on' if power else 'off' } " } def start_music_impl ( energetic : bool , loud : bool ) - > dict : """Play some music matching the specified parameters. Args: energetic: Whether the music is energetic or not. loud: Whether the music is loud or not. Returns: A dictionary containing the music settings. """ music_type = "energetic" if energetic else "chill" volume = "loud" if loud else "quiet" return { "music_type" : music_type , "volume" : volume } def dim_lights_impl ( brightness : float ) - > dict : """Dim the lights. Args: brightness: The brightness of the lights, 0.0 is off, 1.0 is full. Returns: A dictionary containing the new brightness setting. """ return { "brightness" : brightness } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ power_disco_ball_impl , start_music_impl , dim_lights_impl ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Do everything you need to this place into party!" , config = config , ) print ( " \n Example 2: Automatic function calling" ) print ( response . text ) # I've turned on the disco ball, started playing loud and energetic music, and dimmed the lights to 50% brightness. Let's get this party started! Compositional function calling Compositional or sequential function calling allows Gemini to chain multiple function calls together to fulfill a complex request. For example, to answer "Get the temperature in my current location", the Gemini API might first invoke a get_current_location() function followed by a get_weather() function that takes the location as a parameter. The following example demonstrates how to implement compositional function calling using the Python SDK and automatic function calling. Python This example uses the automatic function calling feature of the google-genai Python SDK. The SDK automatically converts the Python functions to the required schema, executes the function calls when requested by the model, and sends the results back to the model to complete the task. import os from google import genai from google.genai import types # Example Functions def get_weather_forecast ( location : str ) - > dict : """Gets the current weather temperature for a given location.""" print ( f "Tool Call: get_weather_forecast(location= { location } )" ) # TODO: Make API call print ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ) return { "temperature" : 25 , "unit" : "celsius" } # Dummy response def set_thermostat_temperature ( temperature : int ) - > dict : """Sets the thermostat to a desired temperature.""" print ( f "Tool Call: set_thermostat_temperature(temperature= { temperature } )" ) # TODO: Interact with a thermostat API print ( "Tool Response: {'status': 'success'}" ) return { "status" : "success" } # Configure the client and model client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_weather_forecast , set_thermostat_temperature ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , config = config , ) # Print the final, user-facing response print ( response . text ) Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . I 've set the thermostat to 20°C. JavaScript This example shows how to use JavaScript/TypeScript SDK to do comopositional function calling using a manual execution loop. import { GoogleGenAI , Type } from "@google/genai" ; // Configure the client const ai = new GoogleGenAI ({}); // Example Functions function get_weather_forecast ({ location }) { console . log ( `Tool Call: get_weather_forecast(location= ${ location } )` ); // TODO: Make API call console . log ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ); return { temperature : 25 , unit : "celsius" }; } function set_thermostat_temperature ({ temperature }) { console . log ( `Tool Call: set_thermostat_temperature(temperature= ${ temperature } )` , ); // TODO: Make API call console . log ( "Tool Response: {'status': 'success'}" ); return { status : "success" }; } const toolFunctions = { get_weather_forecast , set_thermostat_temperature , }; const tools = [ { functionDeclarations : [ { name : "get_weather_forecast" , description : "Gets the current weather temperature for a given location." , parameters : { type : Type . OBJECT , properties : { location : { type : Type . STRING , }, }, required : [ "location" ], }, }, { name : "set_thermostat_temperature" , description : "Sets the thermostat to a desired temperature." , parameters : { type : Type . OBJECT , properties : { temperature : { type : Type . NUMBER , }, }, required : [ "temperature" ], }, }, ], }, ]; // Prompt for the model let contents = [ { role : "user" , parts : [ { text : "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , }, ], }, ]; // Loop until the model has no more function calls to make while ( true ) { const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents , config : { tools }, }); if ( result . functionCalls && result . functionCalls . length > 0 ) { const functionCall = result . functionCalls [ 0 ]; const { name , args } = functionCall ; if ( ! toolFunctions [ name ]) { throw new Error ( `Unknown function call: ${ name } ` ); } // Call the function and get the response. const toolResponse = toolFunctions [ name ]( args ); const functionResponsePart = { name : functionCall . name , response : { result : toolResponse , }, }; // Send the function response back to the model. contents . push ({ role : "model" , parts : [ { functionCall : functionCall , }, ], }); contents . push ({ role : "user" , parts : [ { functionResponse : functionResponsePart , }, ], }); } else { // No more function calls, break the loop. console . log ( result . text ); break ; } } Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . It 's 25°C in London, so I' ve set the thermostat to 20 ° C . Compositional function calling is a native Live API feature. This means Live API can handle the function calling similar to the Python SDK. Python # Light control schemas turn_on_the_lights_schema = { 'name' : 'turn_on_the_lights' } turn_off_the_lights_schema = { 'name' : 'turn_off_the_lights' } prompt = """ Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? """ tools = [ { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} ] await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Light control schemas const turnOnTheLightsSchema = { name : 'turn_on_the_lights' }; const turnOffTheLightsSchema = { name : 'turn_off_the_lights' }; const prompt = ` Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? ` ; const tools = [ { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } ]; await run ( prompt , tools = tools , modality = "AUDIO" ) Function calling modes The Gemini API lets you control how the model uses the provided tools (function declarations). Specifically, you can set the mode within the. function_calling_config . AUTO (Default) : The model decides whether to generate a natural language response or suggest a function call based on the prompt and context. This is the most flexible mode and recommended for most scenarios. ANY : The model is constrained to always predict a function call and guarantees function schema adherence. If allowed_function_names is not specified, the model can choose from any of the provided function declarations. If allowed_function_names is provided as a list, the model can only choose from the functions in that list. Use this mode when you require a function call response to every prompt (if applicable). NONE : The model is prohibited from making function calls. This is equivalent to sending a request without any function declarations. Use this to temporarily disable function calling without removing your tool definitions. Python from google.genai import types # Configure function calling mode tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = "ANY" , allowed_function_names = [ "get_current_temperature" ] ) ) # Create the generation config config = types . GenerateContentConfig ( tools = [ tools ], # not defined here. tool_config = tool_config , ) JavaScript import { FunctionCallingConfigMode } from '@google/genai' ; // Configure function calling mode const toolConfig = { functionCallingConfig : { mode : FunctionCallingConfigMode . ANY , allowedFunctionNames : [ 'get_current_temperature' ] } }; // Create the generation config const config = { tools : tools , // not defined here. toolConfig : toolConfig , }; Automatic function calling (Python only) When using the Python SDK, you can provide Python functions directly as tools. The SDK automatically converts the Python function to declarations, handles the function call execution and the response cycle for you. The Python SDK then automatically: Detects function call responses from the model. Call the corresponding Python function in your code. Sends the function response back to the model. Returns the model's final text response. To use this, define your function with type hints and a docstring, and then pass the function itself (not a JSON declaration) as a tool: Python from google import genai from google.genai import types # Define the function with type hints and docstring def get_current_temperature ( location : str ) - > dict : """Gets the current temperature for a given location. Args: location: The city and state, e.g. San Francisco, CA Returns: A dictionary containing the temperature and unit. """ # ... (implementation) ... return { "temperature" : 25 , "unit" : "Celsius" } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_current_temperature ] ) # Pass the function itself # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "What's the temperature in Boston?" , config = config , ) print ( response . text ) # The SDK handles the function call and returns the final text You can disable automatic function calling with: Python config = types . GenerateContentConfig ( tools = [ get_current_temperature ], automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ) ) Automatic function schema declaration Automatic schema extraction from Python functions doesn't work in all cases. For example, it doesn't handle cases where you describe the fields of a nested dictionary-object. The API is able to describe any of the following types: Python AllowedType = ( int | float | bool | str | list [ 'AllowedType' ] | dict [ str , AllowedType ]) To see what the inferred schema looks like, you can convert it using from_callable : Python def multiply ( a : float , b : float ): """Returns a * b.""" return a * b fn_decl = types . FunctionDeclaration . from_callable ( callable = multiply , client = client ) # to_json_dict() provides a clean JSON representation. print ( fn_decl . to_json_dict ()) Multi-tool use: Combine native tools with function calling You can enable multiple tools combining native tools with function calling at the same time. Here's an example that enables two tools, Grounding with Google Search and code execution , in a request using the Live API . Note: Multi-tool use is a- Live API only feature at the moment. The run() function declaration, which handles the asynchronous websocket setup, is omitted for brevity. Python # Multiple tasks example - combining lights, code execution, and search prompt = """ Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! """ tools = [ { 'google_search' : {}}, { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} # not defined here. ] # Execute the prompt with specified tools in audio modality await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Multiple tasks example - combining lights, code execution, and search const prompt = ` Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! ` ; const tools = [ { googleSearch : {} }, { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } // not defined here. ]; // Execute the prompt with specified tools in audio modality await run ( prompt , { tools : tools , modality : "AUDIO" }); Python developers can try this out in the Live API Tool Use notebook . Model context protocol (MCP) Model Context Protocol (MCP) is an open standard for connecting AI applications with external tools and data. MCP provides a common protocol for models to access context, such as functions (tools), data sources (resources), or predefined prompts. The Gemini SDKs have built-in support for the MCP, reducing boilerplate code and offering automatic tool calling for MCP tools. When the model generates an MCP tool call, the Python and JavaScript client SDK can automatically execute the MCP tool and send the response back to the model in a subsequent request, continuing this loop until no more tool calls are made by the model. Here, you can find an example of how to use a local MCP server with Gemini and mcp SDK. Python Make sure the latest version of the mcp SDK is installed on your platform of choice. pip install mcp Note: Python supports automatic tool calling by passing in the ClientSession into the tools parameters. If you want to disable it, you can provide automatic_function_calling with disabled True . import os import asyncio from datetime import datetime from mcp import ClientSession , StdioServerParameters from mcp.client.stdio import stdio_client from google import genai client = genai . Client () # Create server parameters for stdio connection server_params = StdioServerParameters ( command = "npx" , # Executable args = [ "-y" , "@philschmid/weather-mcp" ], # MCP Server env = None , # Optional environment variables ) async def run (): async with stdio_client ( server_params ) as ( read , write ): async with ClientSession ( read , write ) as session : # Prompt to get the weather for the current day in London. prompt = f "What is the weather in London in { datetime . now () . strftime ( '%Y-%m- %d ' ) } ?" # Initialize the connection between client and server await session . initialize () # Send request to the model with MCP function declarations response = await client . aio . models . generate_content ( model = "gemini-2.5-flash" , contents = prompt , config = genai . types . GenerateContentConfig ( temperature = 0 , tools = [ session ], # uses the session, will automatically call the tool # Uncomment if you **don't** want the SDK to automatically call the tool # automatic_function_calling=genai.types.AutomaticFunctionCallingConfig( # disable=True # ), ), ) print ( response . text ) # Start the asyncio event loop and run the main function asyncio . run ( run ()) JavaScript Make sure the latest version of the mcp SDK is installed on your platform of choice. npm install @modelcontextprotocol/sdk Note: JavaScript supports automatic tool calling by wrapping the client with mcpToTool . If you want to disable it, you can provide automaticFunctionCalling with disabled true . import { GoogleGenAI , FunctionCallingConfigMode , mcpToTool } from '@google/genai' ; import { Client } from "@modelcontextprotocol/sdk/client/index.js" ; import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js" ; // Create server parameters for stdio connection const serverParams = new StdioClientTransport ({ command : "npx" , // Executable args : [ "-y" , "@philschmid/weather-mcp" ] // MCP Server }); const client = new Client ( { name : "example-client" , version : "1.0.0" } ); // Configure the client const ai = new GoogleGenAI ({}); // Initialize the connection between client and server await client . connect ( serverParams ); // Send request to the model with MCP tools const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : `What is the weather in London in ${ new Date (). toLocaleDateString () } ?` , config : { tools : [ mcpToTool ( client )], // uses the session, will automatically call the tool // Uncomment if you **don't** want the sdk to automatically call the tool // automaticFunctionCalling: { // disable: true, // }, }, }); console . log ( response . text ) // Close the connection await client . close (); Limitations with built-in MCP support Built-in MCP support is a experimental feature in our SDKs and has the following limitations: Only tools are supported, not resources nor prompts It is available for the Python and JavaScript/TypeScript SDK. Breaking changes might occur in future releases. Manual integration of MCP servers is always an option if these limit what you're building. Supported models This section lists models and their function calling capabilities. Experimental models are not included. You can find a comprehensive capabilities overview on the model overview page. Model Function Calling Parallel Function Calling Compositional Function Calling Gemini 2.5 Pro ✔️ ✔️ ✔️ Gemini 2.5 Flash ✔️ ✔️ ✔️ Gemini 2.5 Flash-Lite ✔️ ✔️ ✔️ Gemini 2.0 Flash ✔️ ✔️ ✔️ Gemini 2.0 Flash-Lite X X X Best practices Function and Parameter Descriptions: Be extremely clear and specific in your descriptions. The model relies on these to choose the correct function and provide appropriate arguments. Naming: Use descriptive function names (without spaces, periods, or dashes). Strong Typing: Use specific types (integer, string, enum) for parameters to reduce errors. If a parameter has a limited set of valid values, use an enum. Tool Selection: While the model can use an arbitrary number of tools, providing too many can increase the risk of selecting an incorrect or suboptimal tool. For best results, aim to provide only the relevant tools for the context or task, ideally keeping the active set to a maximum of 10-20. Consider dynamic tool selection based on conversation context if you have a large total number of tools. Prompt Engineering: Provide context: Tell the model its role (e.g., "You are a helpful weather assistant."). Give instructions: Specify how and when to use functions (e.g., "Don't guess dates; always use a future date for forecasts."). Encourage clarification: Instruct the model to ask clarifying questions if needed. Temperature: Use a low temperature (e.g., 0) for more deterministic and reliable function calls. Validation: If a function call has significant consequences (e.g., placing an order), validate the call with the user before executing it. Error Handling : Implement robust error handling in your functions to gracefully handle unexpected inputs or API failures. Return informative error messages that the model can use to generate helpful responses to the user. Security: Be mindful of security when calling external APIs. Use appropriate authentication and authorization mechanisms. Avoid exposing sensitive data in function calls. Token Limits: Function descriptions and parameters count towards your input token limit. If you're hitting token limits, consider limiting the number of functions or the length of the descriptions, break down complex tasks into smaller, more focused function sets. Notes and limitations Only a subset of the OpenAPI schema is supported. Supported parameter types in Python are limited. Automatic function calling is a Python SDK feature only. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-10 UTC.
|
text_content/docs_function-calling_84826ed5.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/function-calling#compositional_function_calling
|
| 2 |
+
Title: Function calling with the Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Function calling with the Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Function calling with the Gemini API Function calling lets you connect models to external tools and APIs. Instead of generating text responses, the model determines when to call specific functions and provides the necessary parameters to execute real-world actions. This allows the model to act as a bridge between natural language and real-world actions and data. Function calling has 3 primary use cases: Augment Knowledge: Access information from external sources like databases, APIs, and knowledge bases. Extend Capabilities: Use external tools to perform computations and extend the limitations of the model, such as using a calculator or creating charts. Take Actions: Interact with external systems using APIs, such as scheduling appointments, creating invoices, sending emails, or controlling smart home devices. Get Weather Schedule Meeting Create Chart How function calling works Function calling involves a structured interaction between your application, the model, and external functions. Here's a breakdown of the process: Define Function Declaration: Define the function declaration in your application code. Function Declarations describe the function's name, parameters, and purpose to the model. Call LLM with function declarations: Send user prompt along with the function declaration(s) to the model. It analyzes the request and determines if a function call would be helpful. If so, it responds with a structured JSON object. Execute Function Code (Your Responsibility): The Model does not execute the function itself. It's your application's responsibility to process the response and check for Function Call, if Yes : Extract the name and args of the function and execute the corresponding function in your application. No: The model has provided a direct text response to the prompt (this flow is less emphasized in the example but is a possible outcome). Create User friendly response: If a function was executed, capture the result and send it back to the model in a subsequent turn of the conversation. It will use the result to generate a final, user-friendly response that incorporates the information from the function call. This process can be repeated over multiple turns, allowing for complex interactions and workflows. The model also supports calling multiple functions in a single turn ( parallel function calling ) and in sequence ( compositional function calling ). Step 1: Define a function declaration Define a function and its declaration within your application code that allows users to set light values and make an API request. This function could call external services or APIs. Python # Define a function that the model can call to control smart lights set_light_values_declaration = { "name" : "set_light_values" , "description" : "Sets the brightness and color temperature of a light." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "integer" , "description" : "Light level from 0 to 100. Zero is off and 100 is full brightness" , }, "color_temp" : { "type" : "string" , "enum" : [ "daylight" , "cool" , "warm" ], "description" : "Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`." , }, }, "required" : [ "brightness" , "color_temp" ], }, } # This is the actual function that would be called based on the model's suggestion def set_light_values ( brightness : int , color_temp : str ) - > dict [ str , int | str ]: """Set the brightness and color temperature of a room light. (mock API). Args: brightness: Light level from 0 to 100. Zero is off and 100 is full brightness color_temp: Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. Returns: A dictionary containing the set brightness and color temperature. """ return { "brightness" : brightness , "colorTemperature" : color_temp } JavaScript import { Type } from '@google/genai' ; // Define a function that the model can call to control smart lights const setLightValuesFunctionDeclaration = { name : 'set_light_values' , description : 'Sets the brightness and color temperature of a light.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'Light level from 0 to 100. Zero is off and 100 is full brightness' , }, color_temp : { type : Type . STRING , enum : [ 'daylight' , 'cool' , 'warm' ], description : 'Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`.' , }, }, required : [ 'brightness' , 'color_temp' ], }, }; /** * Set the brightness and color temperature of a room light. (mock API) * @param {number} brightness - Light level from 0 to 100. Zero is off and 100 is full brightness * @param {string} color_temp - Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. * @return {Object} A dictionary containing the set brightness and color temperature. */ function setLightValues ( brightness , color_temp ) { return { brightness : brightness , colorTemperature : color_temp }; } Step 2: Call the model with function declarations Once you have defined your function declarations, you can prompt the model to use them. It analyzes the prompt and function declarations and decides whether to respond directly or to call a function. If a function is called, the response object will contain a function call suggestion. Python from google.genai import types # Configure the client and tools client = genai . Client () tools = types . Tool ( function_declarations = [ set_light_values_declaration ]) config = types . GenerateContentConfig ( tools = [ tools ]) # Define user prompt contents = [ types . Content ( role = "user" , parts = [ types . Part ( text = "Turn the lights down to a romantic level" )] ) ] # Send request with function declarations response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = contents config = config , ) print ( response . candidates [ 0 ] . content . parts [ 0 ] . function_call ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Generation config with function declaration const config = { tools : [{ functionDeclarations : [ setLightValuesFunctionDeclaration ] }] }; // Configure the client const ai = new GoogleGenAI ({}); // Define user prompt const contents = [ { role : 'user' , parts : [{ text : 'Turn the lights down to a romantic level' }] } ]; // Send request with function declarations const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( response . functionCalls [ 0 ]); The model then returns a functionCall object in an OpenAPI compatible schema specifying how to call one or more of the declared functions in order to respond to the user's question. Python id = None args = { 'color_temp' : 'warm' , 'brightness' : 25 } name = 'set_light_values' JavaScript { name : 'set_light_values' , args : { brightness : 25 , color_temp : 'warm' } } Step 3: Execute set_light_values function code Extract the function call details from the model's response, parse the arguments , and execute the set_light_values function. Python # Extract tool call details, it may not be in the first part. tool_call = response . candidates [ 0 ] . content . parts [ 0 ] . function_call if tool_call . name == "set_light_values" : result = set_light_values ( ** tool_call . args ) print ( f "Function execution result: { result } " ) JavaScript // Extract tool call details const tool_call = response . functionCalls [ 0 ] let result ; if ( tool_call . name === 'set_light_values' ) { result = setLightValues ( tool_call . args . brightness , tool_call . args . color_temp ); console . log ( `Function execution result: ${ JSON . stringify ( result ) } ` ); } Step 4: Create user friendly response with function result and call the model again Finally, send the result of the function execution back to the model so it can incorporate this information into its final response to the user. Python # Create a function response part function_response_part = types . Part . from_function_response ( name = tool_call . name , response = { "result" : result }, ) # Append function call and result of the function execution to contents contents . append ( response . candidates [ 0 ] . content ) # Append the content from the model's response. contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Create a function response part const function_response_part = { name : tool_call . name , response : { result } } // Append function call and result of the function execution to contents contents . push ( response . candidates [ 0 ]. content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); // Get the final response from the model const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); This completes the function calling flow. The model successfully used the set_light_values function to perform the request action of the user. Function declarations When you implement function calling in a prompt, you create a tools object, which contains one or more function declarations . You define functions using JSON, specifically with a select subset of the OpenAPI schema format. A single function declaration can include the following parameters: name (string): A unique name for the function ( get_weather_forecast , send_email ). Use descriptive names without spaces or special characters (use underscores or camelCase). description (string): A clear and detailed explanation of the function's purpose and capabilities. This is crucial for the model to understand when to use the function. Be specific and provide examples if helpful ("Finds theaters based on location and optionally movie title which is currently playing in theaters."). parameters (object): Defines the input parameters the function expects. type (string): Specifies the overall data type, such as object . properties (object): Lists individual parameters, each with: type (string): The data type of the parameter, such as string , integer , boolean, array . description (string): A description of the parameter's purpose and format. Provide examples and constraints ("The city and state, e.g., 'San Francisco, CA' or a zip code e.g., '95616'."). enum (array, optional): If the parameter values are from a fixed set, use "enum" to list the allowed values instead of just describing them in the description. This improves accuracy ("enum": ["daylight", "cool", "warm"]). required (array): An array of strings listing the parameter names that are mandatory for the function to operate. Function calling with thinking Enabling "thinking" can improve function call performance by allowing the model to reason through a request before suggesting function calls. However, because the Gemini API is stateless, this reasoning context is lost between turns, which can reduce the quality of function calls as they require multiple turn requests. To preserve this context you can use thought signatures. A thought signature is an encrypted representation of the model's internal thought process that you pass back to the model on subsequent turns. To use thought signatures: Receive the signature: When thinking is enabled, the API response will include a thought_signature field containing an encrypted representation of the model's reasoning. Return the signature: When you send the function's execution result back to the server, include the thought_signature you received. This allows the model to restore its previous thinking context and will likely result in better function calling performance. Receiving signatures from the server Signatures are returned in the part after the model's thinking phase, which typically is a text or function call. Here are some examples of what thought signatures look like returned in each type of part, in response to the request "What's the weather in Lake Tahoe?" using the Get Weather example: Text part [{ "candidates" : [ { "content" : { "parts" : [ { "text" : "Here's what the weather in Lake Tahoe is today" , "thoughtSignature" : "ClcBVKhc7ru7KzUI7SrdUoIdAYLm/+i93aHjfIt4xHyAoO/G70tApxnK2ujBhOhC1PrRy1pkQa88fqFvpHNVd1HDjNLO7mkp6/hFwE+SPPEB3fh0hs4oM8MKhgIBVKhc7uIGvrS7i/T4HpfbnYrluFfWNjZ62gewqe4cVdR/Dlh+zbjtYmDD0gPZ+SuBO7vvHQdzsjePRP+2Y5XddX6LEf/cGGgakq8EhVvw/a6IVzUO6XmpHg2Ag1sl8E9+VFH/lC0R0ZuYdFWligtDuYwp5p5q3o59G0TtWeU2MC1y2MJfE9u/KWd313ldka80/X2W/xF2O/4djMp5G2WKcULfve75zeRCy0mc5iS3SB9mTH0cT6x0vtKjeBx50gcg+CQWtJcRuwTVzz54dmvmK9xvnqA8gKGw3DuaM9wfy5hyY7Qg0z3iyyWdP8T/lbjKim8IEQOk7O1vVwP1Ko7oMYH8JgA1CsoBAVSoXO6v4c5RSyd1cn6EIU0pEFQsjW7rYWPuZdOFq/tsGJT9BCfW7KGkPGwlNSq8jTJFvbcJ/DjtndISQYXwiXd2kGa5JfdS2Kh4zOxCxiWtOk+2nCc3+XQk2nonhO+esGJpkDdbbHZSqRgcUtYKq7q28iPFOQvOFyCiZNB7K86Z/6Hnagu2snSlN/BcTMaFGaWpcCClSUo4foRZn3WbNCoM8rcpD7qEJMp4a5baaSxyyeL1ZTGd2HLpFys/oiW6e3oAnhxuIysCwg==" } ] , "role" : "model" } , "index" : 0 } ] , # Remainder of response... Function call part [{ "candidates" : [ { "content" : { "parts" : [ { "functionCall" : { "name" : "getWeather" , "args" : { "city" : "Lake Tahoe" } } , "thoughtSignature" : "CiwBVKhc7nRyTi3HmggPD9iQiRc261f5jwuMdw3H/itDH0emsb9ZVo3Nwx9p6wpsAVSoXO5i8fDV4jBSBLoaWxB5zUdlGY6aIGp+I0oEnwRRSRQ1LOvrDlojEH8JE8HjiKXALdJrvNPiG+HY3GZEO8pZjEZtc3UoBUh7+SVyjK7Xolu7aRYYeUyzrCapoETWypER1jbrJXnFV23hCosBAVSoXO6oIPNJSmbuEDfGafOhuCSHkpr1yjTp35RXYqmCESzRzWf5+nFXLqncqeFo4ohoxbiYQVpVQbOZF81p8o9zg6xeRE7qMeOv+XN7enXGJ4/s3qNFQpfkSMqRdBITN1VpX7jyfEAjvxBNc7PDfDJZmEPY338ZIY5nFFcmzJSWjVrboFt2sMFv+A==" } ] , "role" : "model" } , "finishReason" : "STOP" , "index" : 0 } ] , # Remainder of response... You can confirm that you received a signature and see what a signature looks like using the following code: # Step 2: Call the model with function declarations # ...Generation config, Configure the client, and Define user prompt (No changes) # Send request with declarations (using a thinking model) response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents ) # See thought signatures for part in response . candidates [ 0 ] . content . parts : if part . thought_signature : print ( "Thought signature:" ) print ( part . thought_signature ) Returning signatures back to the server In order to return signatures back: You should return signatures along with their containing parts back to the server You shouldn't merge a part with a signature with another part which also contains a signature. The signature string is not concatenable You shouldn't merge one part with a signature with another part without a signature. This breaks the correct positioning of the thought represented by the signature. The code will remain the same as in Step 4 of the previous section. But in this case (as indicated in the comment below) you will return signatures to the model along with the result of the function execution so the model can incorporate the thoughts into its final response: Python # Step 4: Create user friendly response with function result and call the model again # ...Create a function response part (No change) # Append thought signatures, function call and result of the function execution to contents function_call_content = response . candidates [ 0 ] . content # Append the model's function call message, which includes thought signatures contents . append ( function_call_content ) contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Step 4: Create user friendly response with function result and call the model again // ...Create a function response part (No change) // Append thought signatures, function call and result of the function execution to contents const function_response_content = response . candidates [ 0 ]. content ; contents . push ( function_response_content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); The following shows what a request returning a thought signature may look like: [{ "contents" : [ { "role" : "user" , "parts" : [ { "text" : "what is the weather in Lake Tahoe?" } ] } , { "parts" : [ { "functionCall" : { "name" : "getWeather" , "args" : { "city" : "Lake Tahoe" } } , "thoughtSignature" : "CiIBVKhc7oDPpCaXyJKKssjqr4g3JNOSgJ/M2V+1THC1icsWCmwBVKhc7pBABbZ+zR3e9234WnWWS6GFXmf8IVwpnzjd5KYd7vyJbn/4vTorWBGayj/vbd9JPaZQjxdAIXhoE5mX/MDsQ7M9N/b0qJjHm39tYIBvS4sIWkMDHqTJqXGLzhhKtrTkfbV3RbaJEkQKmwEBVKhc7qVUgC3hfTXZLo9R3AJzUUIx50NKvJTb9B+UU+LBqgg7Nck1x5OpjWVS2R+SsveprIuYOruk2Y0H53J2OJF8qsxTdIq2si8DGW2V7WK8xyoJH5kbqd7drIw1jLb44b6lx4SMyB0VaULuTBki4d+Ljjg1tJTwR0IYMKqDLDZt9mheINsi0ZxcNjfpnDydRXdWbcSwzmK/wgqJAQFUqFzuKgNVElxs3cbO+xebr2IwcOro84nKTisi0tTp9bICPC9fTUhn3L+rvQWA+d3J1Za8at2bakrqiRj7BTh+CVO9fWQMAEQAs3ni0Z2hfaYG92tOD26E4IoZwyYEoWbfNudpH1fr5tEkyqnEGtWIh7H+XoZQ2DXeiOa+br7Zk88SrNE+trJMCogBAVSoXO5e9fBLg7hnbkmKsrzNLnQtLsQm1gNzjcjEC7nJYklYPp0KI2uGBE1PkM8XNsfllAfHVn7LzHcHNlbQ9pJ7QZTSIeG42goS971r5wNZwxaXwCTphClQh826eqJWo6A/28TtAVQWLhTx5ekbP7qb4nh1UblESZ1saxDQAEo4OKPbDzx5BgqKAQFUqFzuVyjNm5i0wN8hTDnKjfpDroEpPPTs531iFy9BOX+xDCdGHy8D+osFpaoBq6TFekQQbz4hIoUR1YEcP4zI80/cNimEeb9IcFxZTTxiNrbhbbcv0969DSMWhB+ZEqIz4vuw4GLe/xcUvqhlChQwFdgIbdOQHSHpatn5uDlktnP/bi26nKuXIwo0AVSoXO7US22OUH7d1f4abNPI0IyAvhqkPp12rbtWLx9vkOtojE8IP+xCfYtIFuZIzRNZqA==" } ] , "role" : "model" } , { "role" : "user" , "parts" : [ { "functionResponse" : { "name" : "getWeather" , "response" : { "response" : { "stringValue" : "Sunny and hot. 90 degrees Fahrenheit" } } } } ] } ] , # Remainder of request... Learn more about limitations and usage of thought signatures, and about thinking models in general, on the Thinking page. Parallel function calling In addition to single turn function calling, you can also call multiple functions at once. Parallel function calling lets you execute multiple functions at once and is used when the functions are not dependent on each other. This is useful in scenarios like gathering data from multiple independent sources, such as retrieving customer details from different databases or checking inventory levels across various warehouses or performing multiple actions such as converting your apartment into a disco. Python power_disco_ball = { "name" : "power_disco_ball" , "description" : "Powers the spinning disco ball." , "parameters" : { "type" : "object" , "properties" : { "power" : { "type" : "boolean" , "description" : "Whether to turn the disco ball on or off." , } }, "required" : [ "power" ], }, } start_music = { "name" : "start_music" , "description" : "Play some music matching the specified parameters." , "parameters" : { "type" : "object" , "properties" : { "energetic" : { "type" : "boolean" , "description" : "Whether the music is energetic or not." , }, "loud" : { "type" : "boolean" , "description" : "Whether the music is loud or not." , }, }, "required" : [ "energetic" , "loud" ], }, } dim_lights = { "name" : "dim_lights" , "description" : "Dim the lights." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "number" , "description" : "The brightness of the lights, 0.0 is off, 1.0 is full." , } }, "required" : [ "brightness" ], }, } JavaScript import { Type } from '@google/genai' ; const powerDiscoBall = { name : 'power_disco_ball' , description : 'Powers the spinning disco ball.' , parameters : { type : Type . OBJECT , properties : { power : { type : Type . BOOLEAN , description : 'Whether to turn the disco ball on or off.' } }, required : [ 'power' ] } }; const startMusic = { name : 'start_music' , description : 'Play some music matching the specified parameters.' , parameters : { type : Type . OBJECT , properties : { energetic : { type : Type . BOOLEAN , description : 'Whether the music is energetic or not.' }, loud : { type : Type . BOOLEAN , description : 'Whether the music is loud or not.' } }, required : [ 'energetic' , 'loud' ] } }; const dimLights = { name : 'dim_lights' , description : 'Dim the lights.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'The brightness of the lights, 0.0 is off, 1.0 is full.' } }, required : [ 'brightness' ] } }; Configure the function calling mode to allow using all of the specified tools. To learn more, you can read about configuring function calling . Python from google import genai from google.genai import types # Configure the client and tools client = genai . Client () house_tools = [ types . Tool ( function_declarations = [ power_disco_ball , start_music , dim_lights ]) ] config = types . GenerateContentConfig ( tools = house_tools , automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ), # Force the model to call 'any' function, instead of chatting. tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = 'ANY' ) ), ) chat = client . chats . create ( model = "gemini-2.5-flash" , config = config ) response = chat . send_message ( "Turn this place into a party!" ) # Print out each of the function calls requested from this single call print ( "Example 1: Forced function calling" ) for fn in response . function_calls : args = ", " . join ( f " { key } = { val } " for key , val in fn . args . items ()) print ( f " { fn . name } ( { args } )" ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Set up function declarations const houseFns = [ powerDiscoBall , startMusic , dimLights ]; const config = { tools : [{ functionDeclarations : houseFns }], // Force the model to call 'any' function, instead of chatting. toolConfig : { functionCallingConfig : { mode : 'any' } } }; // Configure the client const ai = new GoogleGenAI ({}); // Create a chat session const chat = ai . chats . create ({ model : 'gemini-2.5-flash' , config : config }); const response = await chat . sendMessage ({ message : 'Turn this place into a party!' }); // Print out each of the function calls requested from this single call console . log ( "Example 1: Forced function calling" ); for ( const fn of response . functionCalls ) { const args = Object . entries ( fn . args ) . map (([ key , val ]) = > ` ${ key } = ${ val } ` ) . join ( ', ' ); console . log ( ` ${ fn . name } ( ${ args } )` ); } Each of the printed results reflects a single function call that the model has requested. To send the results back, include the responses in the same order as they were requested. The Python SDK supports automatic function calling , which automatically converts Python functions to declarations, handles the function call execution and response cycle for you. Following is an example for the disco use case. Note: Automatic Function Calling is a Python SDK only feature at the moment. Python from google import genai from google.genai import types # Actual function implementations def power_disco_ball_impl ( power : bool ) - > dict : """Powers the spinning disco ball. Args: power: Whether to turn the disco ball on or off. Returns: A status dictionary indicating the current state. """ return { "status" : f "Disco ball powered { 'on' if power else 'off' } " } def start_music_impl ( energetic : bool , loud : bool ) - > dict : """Play some music matching the specified parameters. Args: energetic: Whether the music is energetic or not. loud: Whether the music is loud or not. Returns: A dictionary containing the music settings. """ music_type = "energetic" if energetic else "chill" volume = "loud" if loud else "quiet" return { "music_type" : music_type , "volume" : volume } def dim_lights_impl ( brightness : float ) - > dict : """Dim the lights. Args: brightness: The brightness of the lights, 0.0 is off, 1.0 is full. Returns: A dictionary containing the new brightness setting. """ return { "brightness" : brightness } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ power_disco_ball_impl , start_music_impl , dim_lights_impl ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Do everything you need to this place into party!" , config = config , ) print ( " \n Example 2: Automatic function calling" ) print ( response . text ) # I've turned on the disco ball, started playing loud and energetic music, and dimmed the lights to 50% brightness. Let's get this party started! Compositional function calling Compositional or sequential function calling allows Gemini to chain multiple function calls together to fulfill a complex request. For example, to answer "Get the temperature in my current location", the Gemini API might first invoke a get_current_location() function followed by a get_weather() function that takes the location as a parameter. The following example demonstrates how to implement compositional function calling using the Python SDK and automatic function calling. Python This example uses the automatic function calling feature of the google-genai Python SDK. The SDK automatically converts the Python functions to the required schema, executes the function calls when requested by the model, and sends the results back to the model to complete the task. import os from google import genai from google.genai import types # Example Functions def get_weather_forecast ( location : str ) - > dict : """Gets the current weather temperature for a given location.""" print ( f "Tool Call: get_weather_forecast(location= { location } )" ) # TODO: Make API call print ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ) return { "temperature" : 25 , "unit" : "celsius" } # Dummy response def set_thermostat_temperature ( temperature : int ) - > dict : """Sets the thermostat to a desired temperature.""" print ( f "Tool Call: set_thermostat_temperature(temperature= { temperature } )" ) # TODO: Interact with a thermostat API print ( "Tool Response: {'status': 'success'}" ) return { "status" : "success" } # Configure the client and model client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_weather_forecast , set_thermostat_temperature ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , config = config , ) # Print the final, user-facing response print ( response . text ) Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . I 've set the thermostat to 20°C. JavaScript This example shows how to use JavaScript/TypeScript SDK to do comopositional function calling using a manual execution loop. import { GoogleGenAI , Type } from "@google/genai" ; // Configure the client const ai = new GoogleGenAI ({}); // Example Functions function get_weather_forecast ({ location }) { console . log ( `Tool Call: get_weather_forecast(location= ${ location } )` ); // TODO: Make API call console . log ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ); return { temperature : 25 , unit : "celsius" }; } function set_thermostat_temperature ({ temperature }) { console . log ( `Tool Call: set_thermostat_temperature(temperature= ${ temperature } )` , ); // TODO: Make API call console . log ( "Tool Response: {'status': 'success'}" ); return { status : "success" }; } const toolFunctions = { get_weather_forecast , set_thermostat_temperature , }; const tools = [ { functionDeclarations : [ { name : "get_weather_forecast" , description : "Gets the current weather temperature for a given location." , parameters : { type : Type . OBJECT , properties : { location : { type : Type . STRING , }, }, required : [ "location" ], }, }, { name : "set_thermostat_temperature" , description : "Sets the thermostat to a desired temperature." , parameters : { type : Type . OBJECT , properties : { temperature : { type : Type . NUMBER , }, }, required : [ "temperature" ], }, }, ], }, ]; // Prompt for the model let contents = [ { role : "user" , parts : [ { text : "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , }, ], }, ]; // Loop until the model has no more function calls to make while ( true ) { const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents , config : { tools }, }); if ( result . functionCalls && result . functionCalls . length > 0 ) { const functionCall = result . functionCalls [ 0 ]; const { name , args } = functionCall ; if ( ! toolFunctions [ name ]) { throw new Error ( `Unknown function call: ${ name } ` ); } // Call the function and get the response. const toolResponse = toolFunctions [ name ]( args ); const functionResponsePart = { name : functionCall . name , response : { result : toolResponse , }, }; // Send the function response back to the model. contents . push ({ role : "model" , parts : [ { functionCall : functionCall , }, ], }); contents . push ({ role : "user" , parts : [ { functionResponse : functionResponsePart , }, ], }); } else { // No more function calls, break the loop. console . log ( result . text ); break ; } } Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . It 's 25°C in London, so I' ve set the thermostat to 20 ° C . Compositional function calling is a native Live API feature. This means Live API can handle the function calling similar to the Python SDK. Python # Light control schemas turn_on_the_lights_schema = { 'name' : 'turn_on_the_lights' } turn_off_the_lights_schema = { 'name' : 'turn_off_the_lights' } prompt = """ Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? """ tools = [ { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} ] await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Light control schemas const turnOnTheLightsSchema = { name : 'turn_on_the_lights' }; const turnOffTheLightsSchema = { name : 'turn_off_the_lights' }; const prompt = ` Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? ` ; const tools = [ { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } ]; await run ( prompt , tools = tools , modality = "AUDIO" ) Function calling modes The Gemini API lets you control how the model uses the provided tools (function declarations). Specifically, you can set the mode within the. function_calling_config . AUTO (Default) : The model decides whether to generate a natural language response or suggest a function call based on the prompt and context. This is the most flexible mode and recommended for most scenarios. ANY : The model is constrained to always predict a function call and guarantees function schema adherence. If allowed_function_names is not specified, the model can choose from any of the provided function declarations. If allowed_function_names is provided as a list, the model can only choose from the functions in that list. Use this mode when you require a function call response to every prompt (if applicable). NONE : The model is prohibited from making function calls. This is equivalent to sending a request without any function declarations. Use this to temporarily disable function calling without removing your tool definitions. Python from google.genai import types # Configure function calling mode tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = "ANY" , allowed_function_names = [ "get_current_temperature" ] ) ) # Create the generation config config = types . GenerateContentConfig ( tools = [ tools ], # not defined here. tool_config = tool_config , ) JavaScript import { FunctionCallingConfigMode } from '@google/genai' ; // Configure function calling mode const toolConfig = { functionCallingConfig : { mode : FunctionCallingConfigMode . ANY , allowedFunctionNames : [ 'get_current_temperature' ] } }; // Create the generation config const config = { tools : tools , // not defined here. toolConfig : toolConfig , }; Automatic function calling (Python only) When using the Python SDK, you can provide Python functions directly as tools. The SDK automatically converts the Python function to declarations, handles the function call execution and the response cycle for you. The Python SDK then automatically: Detects function call responses from the model. Call the corresponding Python function in your code. Sends the function response back to the model. Returns the model's final text response. To use this, define your function with type hints and a docstring, and then pass the function itself (not a JSON declaration) as a tool: Python from google import genai from google.genai import types # Define the function with type hints and docstring def get_current_temperature ( location : str ) - > dict : """Gets the current temperature for a given location. Args: location: The city and state, e.g. San Francisco, CA Returns: A dictionary containing the temperature and unit. """ # ... (implementation) ... return { "temperature" : 25 , "unit" : "Celsius" } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_current_temperature ] ) # Pass the function itself # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "What's the temperature in Boston?" , config = config , ) print ( response . text ) # The SDK handles the function call and returns the final text You can disable automatic function calling with: Python config = types . GenerateContentConfig ( tools = [ get_current_temperature ], automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ) ) Automatic function schema declaration Automatic schema extraction from Python functions doesn't work in all cases. For example, it doesn't handle cases where you describe the fields of a nested dictionary-object. The API is able to describe any of the following types: Python AllowedType = ( int | float | bool | str | list [ 'AllowedType' ] | dict [ str , AllowedType ]) To see what the inferred schema looks like, you can convert it using from_callable : Python def multiply ( a : float , b : float ): """Returns a * b.""" return a * b fn_decl = types . FunctionDeclaration . from_callable ( callable = multiply , client = client ) # to_json_dict() provides a clean JSON representation. print ( fn_decl . to_json_dict ()) Multi-tool use: Combine native tools with function calling You can enable multiple tools combining native tools with function calling at the same time. Here's an example that enables two tools, Grounding with Google Search and code execution , in a request using the Live API . Note: Multi-tool use is a- Live API only feature at the moment. The run() function declaration, which handles the asynchronous websocket setup, is omitted for brevity. Python # Multiple tasks example - combining lights, code execution, and search prompt = """ Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! """ tools = [ { 'google_search' : {}}, { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} # not defined here. ] # Execute the prompt with specified tools in audio modality await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Multiple tasks example - combining lights, code execution, and search const prompt = ` Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! ` ; const tools = [ { googleSearch : {} }, { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } // not defined here. ]; // Execute the prompt with specified tools in audio modality await run ( prompt , { tools : tools , modality : "AUDIO" }); Python developers can try this out in the Live API Tool Use notebook . Model context protocol (MCP) Model Context Protocol (MCP) is an open standard for connecting AI applications with external tools and data. MCP provides a common protocol for models to access context, such as functions (tools), data sources (resources), or predefined prompts. The Gemini SDKs have built-in support for the MCP, reducing boilerplate code and offering automatic tool calling for MCP tools. When the model generates an MCP tool call, the Python and JavaScript client SDK can automatically execute the MCP tool and send the response back to the model in a subsequent request, continuing this loop until no more tool calls are made by the model. Here, you can find an example of how to use a local MCP server with Gemini and mcp SDK. Python Make sure the latest version of the mcp SDK is installed on your platform of choice. pip install mcp Note: Python supports automatic tool calling by passing in the ClientSession into the tools parameters. If you want to disable it, you can provide automatic_function_calling with disabled True . import os import asyncio from datetime import datetime from mcp import ClientSession , StdioServerParameters from mcp.client.stdio import stdio_client from google import genai client = genai . Client () # Create server parameters for stdio connection server_params = StdioServerParameters ( command = "npx" , # Executable args = [ "-y" , "@philschmid/weather-mcp" ], # MCP Server env = None , # Optional environment variables ) async def run (): async with stdio_client ( server_params ) as ( read , write ): async with ClientSession ( read , write ) as session : # Prompt to get the weather for the current day in London. prompt = f "What is the weather in London in { datetime . now () . strftime ( '%Y-%m- %d ' ) } ?" # Initialize the connection between client and server await session . initialize () # Send request to the model with MCP function declarations response = await client . aio . models . generate_content ( model = "gemini-2.5-flash" , contents = prompt , config = genai . types . GenerateContentConfig ( temperature = 0 , tools = [ session ], # uses the session, will automatically call the tool # Uncomment if you **don't** want the SDK to automatically call the tool # automatic_function_calling=genai.types.AutomaticFunctionCallingConfig( # disable=True # ), ), ) print ( response . text ) # Start the asyncio event loop and run the main function asyncio . run ( run ()) JavaScript Make sure the latest version of the mcp SDK is installed on your platform of choice. npm install @modelcontextprotocol/sdk Note: JavaScript supports automatic tool calling by wrapping the client with mcpToTool . If you want to disable it, you can provide automaticFunctionCalling with disabled true . import { GoogleGenAI , FunctionCallingConfigMode , mcpToTool } from '@google/genai' ; import { Client } from "@modelcontextprotocol/sdk/client/index.js" ; import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js" ; // Create server parameters for stdio connection const serverParams = new StdioClientTransport ({ command : "npx" , // Executable args : [ "-y" , "@philschmid/weather-mcp" ] // MCP Server }); const client = new Client ( { name : "example-client" , version : "1.0.0" } ); // Configure the client const ai = new GoogleGenAI ({}); // Initialize the connection between client and server await client . connect ( serverParams ); // Send request to the model with MCP tools const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : `What is the weather in London in ${ new Date (). toLocaleDateString () } ?` , config : { tools : [ mcpToTool ( client )], // uses the session, will automatically call the tool // Uncomment if you **don't** want the sdk to automatically call the tool // automaticFunctionCalling: { // disable: true, // }, }, }); console . log ( response . text ) // Close the connection await client . close (); Limitations with built-in MCP support Built-in MCP support is a experimental feature in our SDKs and has the following limitations: Only tools are supported, not resources nor prompts It is available for the Python and JavaScript/TypeScript SDK. Breaking changes might occur in future releases. Manual integration of MCP servers is always an option if these limit what you're building. Supported models This section lists models and their function calling capabilities. Experimental models are not included. You can find a comprehensive capabilities overview on the model overview page. Model Function Calling Parallel Function Calling Compositional Function Calling Gemini 2.5 Pro ✔️ ✔️ ✔️ Gemini 2.5 Flash ✔️ ✔️ ✔️ Gemini 2.5 Flash-Lite ✔️ ✔️ ✔️ Gemini 2.0 Flash ✔️ ✔️ ✔️ Gemini 2.0 Flash-Lite X X X Best practices Function and Parameter Descriptions: Be extremely clear and specific in your descriptions. The model relies on these to choose the correct function and provide appropriate arguments. Naming: Use descriptive function names (without spaces, periods, or dashes). Strong Typing: Use specific types (integer, string, enum) for parameters to reduce errors. If a parameter has a limited set of valid values, use an enum. Tool Selection: While the model can use an arbitrary number of tools, providing too many can increase the risk of selecting an incorrect or suboptimal tool. For best results, aim to provide only the relevant tools for the context or task, ideally keeping the active set to a maximum of 10-20. Consider dynamic tool selection based on conversation context if you have a large total number of tools. Prompt Engineering: Provide context: Tell the model its role (e.g., "You are a helpful weather assistant."). Give instructions: Specify how and when to use functions (e.g., "Don't guess dates; always use a future date for forecasts."). Encourage clarification: Instruct the model to ask clarifying questions if needed. Temperature: Use a low temperature (e.g., 0) for more deterministic and reliable function calls. Validation: If a function call has significant consequences (e.g., placing an order), validate the call with the user before executing it. Error Handling : Implement robust error handling in your functions to gracefully handle unexpected inputs or API failures. Return informative error messages that the model can use to generate helpful responses to the user. Security: Be mindful of security when calling external APIs. Use appropriate authentication and authorization mechanisms. Avoid exposing sensitive data in function calls. Token Limits: Function descriptions and parameters count towards your input token limit. If you're hitting token limits, consider limiting the number of functions or the length of the descriptions, break down complex tasks into smaller, more focused function sets. Notes and limitations Only a subset of the OpenAPI schema is supported. Supported parameter types in Python are limited. Automatic function calling is a Python SDK feature only. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-10 UTC.
|
text_content/docs_function-calling_8cc46b89.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/function-calling#function_calling_modes
|
| 2 |
+
Title: Function calling with the Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Function calling with the Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Function calling with the Gemini API Function calling lets you connect models to external tools and APIs. Instead of generating text responses, the model determines when to call specific functions and provides the necessary parameters to execute real-world actions. This allows the model to act as a bridge between natural language and real-world actions and data. Function calling has 3 primary use cases: Augment Knowledge: Access information from external sources like databases, APIs, and knowledge bases. Extend Capabilities: Use external tools to perform computations and extend the limitations of the model, such as using a calculator or creating charts. Take Actions: Interact with external systems using APIs, such as scheduling appointments, creating invoices, sending emails, or controlling smart home devices. Get Weather Schedule Meeting Create Chart How function calling works Function calling involves a structured interaction between your application, the model, and external functions. Here's a breakdown of the process: Define Function Declaration: Define the function declaration in your application code. Function Declarations describe the function's name, parameters, and purpose to the model. Call LLM with function declarations: Send user prompt along with the function declaration(s) to the model. It analyzes the request and determines if a function call would be helpful. If so, it responds with a structured JSON object. Execute Function Code (Your Responsibility): The Model does not execute the function itself. It's your application's responsibility to process the response and check for Function Call, if Yes : Extract the name and args of the function and execute the corresponding function in your application. No: The model has provided a direct text response to the prompt (this flow is less emphasized in the example but is a possible outcome). Create User friendly response: If a function was executed, capture the result and send it back to the model in a subsequent turn of the conversation. It will use the result to generate a final, user-friendly response that incorporates the information from the function call. This process can be repeated over multiple turns, allowing for complex interactions and workflows. The model also supports calling multiple functions in a single turn ( parallel function calling ) and in sequence ( compositional function calling ). Step 1: Define a function declaration Define a function and its declaration within your application code that allows users to set light values and make an API request. This function could call external services or APIs. Python # Define a function that the model can call to control smart lights set_light_values_declaration = { "name" : "set_light_values" , "description" : "Sets the brightness and color temperature of a light." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "integer" , "description" : "Light level from 0 to 100. Zero is off and 100 is full brightness" , }, "color_temp" : { "type" : "string" , "enum" : [ "daylight" , "cool" , "warm" ], "description" : "Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`." , }, }, "required" : [ "brightness" , "color_temp" ], }, } # This is the actual function that would be called based on the model's suggestion def set_light_values ( brightness : int , color_temp : str ) - > dict [ str , int | str ]: """Set the brightness and color temperature of a room light. (mock API). Args: brightness: Light level from 0 to 100. Zero is off and 100 is full brightness color_temp: Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. Returns: A dictionary containing the set brightness and color temperature. """ return { "brightness" : brightness , "colorTemperature" : color_temp } JavaScript import { Type } from '@google/genai' ; // Define a function that the model can call to control smart lights const setLightValuesFunctionDeclaration = { name : 'set_light_values' , description : 'Sets the brightness and color temperature of a light.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'Light level from 0 to 100. Zero is off and 100 is full brightness' , }, color_temp : { type : Type . STRING , enum : [ 'daylight' , 'cool' , 'warm' ], description : 'Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`.' , }, }, required : [ 'brightness' , 'color_temp' ], }, }; /** * Set the brightness and color temperature of a room light. (mock API) * @param {number} brightness - Light level from 0 to 100. Zero is off and 100 is full brightness * @param {string} color_temp - Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. * @return {Object} A dictionary containing the set brightness and color temperature. */ function setLightValues ( brightness , color_temp ) { return { brightness : brightness , colorTemperature : color_temp }; } Step 2: Call the model with function declarations Once you have defined your function declarations, you can prompt the model to use them. It analyzes the prompt and function declarations and decides whether to respond directly or to call a function. If a function is called, the response object will contain a function call suggestion. Python from google.genai import types # Configure the client and tools client = genai . Client () tools = types . Tool ( function_declarations = [ set_light_values_declaration ]) config = types . GenerateContentConfig ( tools = [ tools ]) # Define user prompt contents = [ types . Content ( role = "user" , parts = [ types . Part ( text = "Turn the lights down to a romantic level" )] ) ] # Send request with function declarations response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = contents config = config , ) print ( response . candidates [ 0 ] . content . parts [ 0 ] . function_call ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Generation config with function declaration const config = { tools : [{ functionDeclarations : [ setLightValuesFunctionDeclaration ] }] }; // Configure the client const ai = new GoogleGenAI ({}); // Define user prompt const contents = [ { role : 'user' , parts : [{ text : 'Turn the lights down to a romantic level' }] } ]; // Send request with function declarations const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( response . functionCalls [ 0 ]); The model then returns a functionCall object in an OpenAPI compatible schema specifying how to call one or more of the declared functions in order to respond to the user's question. Python id = None args = { 'color_temp' : 'warm' , 'brightness' : 25 } name = 'set_light_values' JavaScript { name : 'set_light_values' , args : { brightness : 25 , color_temp : 'warm' } } Step 3: Execute set_light_values function code Extract the function call details from the model's response, parse the arguments , and execute the set_light_values function. Python # Extract tool call details, it may not be in the first part. tool_call = response . candidates [ 0 ] . content . parts [ 0 ] . function_call if tool_call . name == "set_light_values" : result = set_light_values ( ** tool_call . args ) print ( f "Function execution result: { result } " ) JavaScript // Extract tool call details const tool_call = response . functionCalls [ 0 ] let result ; if ( tool_call . name === 'set_light_values' ) { result = setLightValues ( tool_call . args . brightness , tool_call . args . color_temp ); console . log ( `Function execution result: ${ JSON . stringify ( result ) } ` ); } Step 4: Create user friendly response with function result and call the model again Finally, send the result of the function execution back to the model so it can incorporate this information into its final response to the user. Python # Create a function response part function_response_part = types . Part . from_function_response ( name = tool_call . name , response = { "result" : result }, ) # Append function call and result of the function execution to contents contents . append ( response . candidates [ 0 ] . content ) # Append the content from the model's response. contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Create a function response part const function_response_part = { name : tool_call . name , response : { result } } // Append function call and result of the function execution to contents contents . push ( response . candidates [ 0 ]. content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); // Get the final response from the model const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); This completes the function calling flow. The model successfully used the set_light_values function to perform the request action of the user. Function declarations When you implement function calling in a prompt, you create a tools object, which contains one or more function declarations . You define functions using JSON, specifically with a select subset of the OpenAPI schema format. A single function declaration can include the following parameters: name (string): A unique name for the function ( get_weather_forecast , send_email ). Use descriptive names without spaces or special characters (use underscores or camelCase). description (string): A clear and detailed explanation of the function's purpose and capabilities. This is crucial for the model to understand when to use the function. Be specific and provide examples if helpful ("Finds theaters based on location and optionally movie title which is currently playing in theaters."). parameters (object): Defines the input parameters the function expects. type (string): Specifies the overall data type, such as object . properties (object): Lists individual parameters, each with: type (string): The data type of the parameter, such as string , integer , boolean, array . description (string): A description of the parameter's purpose and format. Provide examples and constraints ("The city and state, e.g., 'San Francisco, CA' or a zip code e.g., '95616'."). enum (array, optional): If the parameter values are from a fixed set, use "enum" to list the allowed values instead of just describing them in the description. This improves accuracy ("enum": ["daylight", "cool", "warm"]). required (array): An array of strings listing the parameter names that are mandatory for the function to operate. Function calling with thinking Enabling "thinking" can improve function call performance by allowing the model to reason through a request before suggesting function calls. However, because the Gemini API is stateless, this reasoning context is lost between turns, which can reduce the quality of function calls as they require multiple turn requests. To preserve this context you can use thought signatures. A thought signature is an encrypted representation of the model's internal thought process that you pass back to the model on subsequent turns. To use thought signatures: Receive the signature: When thinking is enabled, the API response will include a thought_signature field containing an encrypted representation of the model's reasoning. Return the signature: When you send the function's execution result back to the server, include the thought_signature you received. This allows the model to restore its previous thinking context and will likely result in better function calling performance. Receiving signatures from the server Signatures are returned in the part after the model's thinking phase, which typically is a text or function call. Here are some examples of what thought signatures look like returned in each type of part, in response to the request "What's the weather in Lake Tahoe?" using the Get Weather example: Text part [{ "candidates" : [ { "content" : { "parts" : [ { "text" : "Here's what the weather in Lake Tahoe is today" , "thoughtSignature" : "ClcBVKhc7ru7KzUI7SrdUoIdAYLm/+i93aHjfIt4xHyAoO/G70tApxnK2ujBhOhC1PrRy1pkQa88fqFvpHNVd1HDjNLO7mkp6/hFwE+SPPEB3fh0hs4oM8MKhgIBVKhc7uIGvrS7i/T4HpfbnYrluFfWNjZ62gewqe4cVdR/Dlh+zbjtYmDD0gPZ+SuBO7vvHQdzsjePRP+2Y5XddX6LEf/cGGgakq8EhVvw/a6IVzUO6XmpHg2Ag1sl8E9+VFH/lC0R0ZuYdFWligtDuYwp5p5q3o59G0TtWeU2MC1y2MJfE9u/KWd313ldka80/X2W/xF2O/4djMp5G2WKcULfve75zeRCy0mc5iS3SB9mTH0cT6x0vtKjeBx50gcg+CQWtJcRuwTVzz54dmvmK9xvnqA8gKGw3DuaM9wfy5hyY7Qg0z3iyyWdP8T/lbjKim8IEQOk7O1vVwP1Ko7oMYH8JgA1CsoBAVSoXO6v4c5RSyd1cn6EIU0pEFQsjW7rYWPuZdOFq/tsGJT9BCfW7KGkPGwlNSq8jTJFvbcJ/DjtndISQYXwiXd2kGa5JfdS2Kh4zOxCxiWtOk+2nCc3+XQk2nonhO+esGJpkDdbbHZSqRgcUtYKq7q28iPFOQvOFyCiZNB7K86Z/6Hnagu2snSlN/BcTMaFGaWpcCClSUo4foRZn3WbNCoM8rcpD7qEJMp4a5baaSxyyeL1ZTGd2HLpFys/oiW6e3oAnhxuIysCwg==" } ] , "role" : "model" } , "index" : 0 } ] , # Remainder of response... Function call part [{ "candidates" : [ { "content" : { "parts" : [ { "functionCall" : { "name" : "getWeather" , "args" : { "city" : "Lake Tahoe" } } , "thoughtSignature" : "CiwBVKhc7nRyTi3HmggPD9iQiRc261f5jwuMdw3H/itDH0emsb9ZVo3Nwx9p6wpsAVSoXO5i8fDV4jBSBLoaWxB5zUdlGY6aIGp+I0oEnwRRSRQ1LOvrDlojEH8JE8HjiKXALdJrvNPiG+HY3GZEO8pZjEZtc3UoBUh7+SVyjK7Xolu7aRYYeUyzrCapoETWypER1jbrJXnFV23hCosBAVSoXO6oIPNJSmbuEDfGafOhuCSHkpr1yjTp35RXYqmCESzRzWf5+nFXLqncqeFo4ohoxbiYQVpVQbOZF81p8o9zg6xeRE7qMeOv+XN7enXGJ4/s3qNFQpfkSMqRdBITN1VpX7jyfEAjvxBNc7PDfDJZmEPY338ZIY5nFFcmzJSWjVrboFt2sMFv+A==" } ] , "role" : "model" } , "finishReason" : "STOP" , "index" : 0 } ] , # Remainder of response... You can confirm that you received a signature and see what a signature looks like using the following code: # Step 2: Call the model with function declarations # ...Generation config, Configure the client, and Define user prompt (No changes) # Send request with declarations (using a thinking model) response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents ) # See thought signatures for part in response . candidates [ 0 ] . content . parts : if part . thought_signature : print ( "Thought signature:" ) print ( part . thought_signature ) Returning signatures back to the server In order to return signatures back: You should return signatures along with their containing parts back to the server You shouldn't merge a part with a signature with another part which also contains a signature. The signature string is not concatenable You shouldn't merge one part with a signature with another part without a signature. This breaks the correct positioning of the thought represented by the signature. The code will remain the same as in Step 4 of the previous section. But in this case (as indicated in the comment below) you will return signatures to the model along with the result of the function execution so the model can incorporate the thoughts into its final response: Python # Step 4: Create user friendly response with function result and call the model again # ...Create a function response part (No change) # Append thought signatures, function call and result of the function execution to contents function_call_content = response . candidates [ 0 ] . content # Append the model's function call message, which includes thought signatures contents . append ( function_call_content ) contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Step 4: Create user friendly response with function result and call the model again // ...Create a function response part (No change) // Append thought signatures, function call and result of the function execution to contents const function_response_content = response . candidates [ 0 ]. content ; contents . push ( function_response_content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); The following shows what a request returning a thought signature may look like: [{ "contents" : [ { "role" : "user" , "parts" : [ { "text" : "what is the weather in Lake Tahoe?" } ] } , { "parts" : [ { "functionCall" : { "name" : "getWeather" , "args" : { "city" : "Lake Tahoe" } } , "thoughtSignature" : "CiIBVKhc7oDPpCaXyJKKssjqr4g3JNOSgJ/M2V+1THC1icsWCmwBVKhc7pBABbZ+zR3e9234WnWWS6GFXmf8IVwpnzjd5KYd7vyJbn/4vTorWBGayj/vbd9JPaZQjxdAIXhoE5mX/MDsQ7M9N/b0qJjHm39tYIBvS4sIWkMDHqTJqXGLzhhKtrTkfbV3RbaJEkQKmwEBVKhc7qVUgC3hfTXZLo9R3AJzUUIx50NKvJTb9B+UU+LBqgg7Nck1x5OpjWVS2R+SsveprIuYOruk2Y0H53J2OJF8qsxTdIq2si8DGW2V7WK8xyoJH5kbqd7drIw1jLb44b6lx4SMyB0VaULuTBki4d+Ljjg1tJTwR0IYMKqDLDZt9mheINsi0ZxcNjfpnDydRXdWbcSwzmK/wgqJAQFUqFzuKgNVElxs3cbO+xebr2IwcOro84nKTisi0tTp9bICPC9fTUhn3L+rvQWA+d3J1Za8at2bakrqiRj7BTh+CVO9fWQMAEQAs3ni0Z2hfaYG92tOD26E4IoZwyYEoWbfNudpH1fr5tEkyqnEGtWIh7H+XoZQ2DXeiOa+br7Zk88SrNE+trJMCogBAVSoXO5e9fBLg7hnbkmKsrzNLnQtLsQm1gNzjcjEC7nJYklYPp0KI2uGBE1PkM8XNsfllAfHVn7LzHcHNlbQ9pJ7QZTSIeG42goS971r5wNZwxaXwCTphClQh826eqJWo6A/28TtAVQWLhTx5ekbP7qb4nh1UblESZ1saxDQAEo4OKPbDzx5BgqKAQFUqFzuVyjNm5i0wN8hTDnKjfpDroEpPPTs531iFy9BOX+xDCdGHy8D+osFpaoBq6TFekQQbz4hIoUR1YEcP4zI80/cNimEeb9IcFxZTTxiNrbhbbcv0969DSMWhB+ZEqIz4vuw4GLe/xcUvqhlChQwFdgIbdOQHSHpatn5uDlktnP/bi26nKuXIwo0AVSoXO7US22OUH7d1f4abNPI0IyAvhqkPp12rbtWLx9vkOtojE8IP+xCfYtIFuZIzRNZqA==" } ] , "role" : "model" } , { "role" : "user" , "parts" : [ { "functionResponse" : { "name" : "getWeather" , "response" : { "response" : { "stringValue" : "Sunny and hot. 90 degrees Fahrenheit" } } } } ] } ] , # Remainder of request... Learn more about limitations and usage of thought signatures, and about thinking models in general, on the Thinking page. Parallel function calling In addition to single turn function calling, you can also call multiple functions at once. Parallel function calling lets you execute multiple functions at once and is used when the functions are not dependent on each other. This is useful in scenarios like gathering data from multiple independent sources, such as retrieving customer details from different databases or checking inventory levels across various warehouses or performing multiple actions such as converting your apartment into a disco. Python power_disco_ball = { "name" : "power_disco_ball" , "description" : "Powers the spinning disco ball." , "parameters" : { "type" : "object" , "properties" : { "power" : { "type" : "boolean" , "description" : "Whether to turn the disco ball on or off." , } }, "required" : [ "power" ], }, } start_music = { "name" : "start_music" , "description" : "Play some music matching the specified parameters." , "parameters" : { "type" : "object" , "properties" : { "energetic" : { "type" : "boolean" , "description" : "Whether the music is energetic or not." , }, "loud" : { "type" : "boolean" , "description" : "Whether the music is loud or not." , }, }, "required" : [ "energetic" , "loud" ], }, } dim_lights = { "name" : "dim_lights" , "description" : "Dim the lights." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "number" , "description" : "The brightness of the lights, 0.0 is off, 1.0 is full." , } }, "required" : [ "brightness" ], }, } JavaScript import { Type } from '@google/genai' ; const powerDiscoBall = { name : 'power_disco_ball' , description : 'Powers the spinning disco ball.' , parameters : { type : Type . OBJECT , properties : { power : { type : Type . BOOLEAN , description : 'Whether to turn the disco ball on or off.' } }, required : [ 'power' ] } }; const startMusic = { name : 'start_music' , description : 'Play some music matching the specified parameters.' , parameters : { type : Type . OBJECT , properties : { energetic : { type : Type . BOOLEAN , description : 'Whether the music is energetic or not.' }, loud : { type : Type . BOOLEAN , description : 'Whether the music is loud or not.' } }, required : [ 'energetic' , 'loud' ] } }; const dimLights = { name : 'dim_lights' , description : 'Dim the lights.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'The brightness of the lights, 0.0 is off, 1.0 is full.' } }, required : [ 'brightness' ] } }; Configure the function calling mode to allow using all of the specified tools. To learn more, you can read about configuring function calling . Python from google import genai from google.genai import types # Configure the client and tools client = genai . Client () house_tools = [ types . Tool ( function_declarations = [ power_disco_ball , start_music , dim_lights ]) ] config = types . GenerateContentConfig ( tools = house_tools , automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ), # Force the model to call 'any' function, instead of chatting. tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = 'ANY' ) ), ) chat = client . chats . create ( model = "gemini-2.5-flash" , config = config ) response = chat . send_message ( "Turn this place into a party!" ) # Print out each of the function calls requested from this single call print ( "Example 1: Forced function calling" ) for fn in response . function_calls : args = ", " . join ( f " { key } = { val } " for key , val in fn . args . items ()) print ( f " { fn . name } ( { args } )" ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Set up function declarations const houseFns = [ powerDiscoBall , startMusic , dimLights ]; const config = { tools : [{ functionDeclarations : houseFns }], // Force the model to call 'any' function, instead of chatting. toolConfig : { functionCallingConfig : { mode : 'any' } } }; // Configure the client const ai = new GoogleGenAI ({}); // Create a chat session const chat = ai . chats . create ({ model : 'gemini-2.5-flash' , config : config }); const response = await chat . sendMessage ({ message : 'Turn this place into a party!' }); // Print out each of the function calls requested from this single call console . log ( "Example 1: Forced function calling" ); for ( const fn of response . functionCalls ) { const args = Object . entries ( fn . args ) . map (([ key , val ]) = > ` ${ key } = ${ val } ` ) . join ( ', ' ); console . log ( ` ${ fn . name } ( ${ args } )` ); } Each of the printed results reflects a single function call that the model has requested. To send the results back, include the responses in the same order as they were requested. The Python SDK supports automatic function calling , which automatically converts Python functions to declarations, handles the function call execution and response cycle for you. Following is an example for the disco use case. Note: Automatic Function Calling is a Python SDK only feature at the moment. Python from google import genai from google.genai import types # Actual function implementations def power_disco_ball_impl ( power : bool ) - > dict : """Powers the spinning disco ball. Args: power: Whether to turn the disco ball on or off. Returns: A status dictionary indicating the current state. """ return { "status" : f "Disco ball powered { 'on' if power else 'off' } " } def start_music_impl ( energetic : bool , loud : bool ) - > dict : """Play some music matching the specified parameters. Args: energetic: Whether the music is energetic or not. loud: Whether the music is loud or not. Returns: A dictionary containing the music settings. """ music_type = "energetic" if energetic else "chill" volume = "loud" if loud else "quiet" return { "music_type" : music_type , "volume" : volume } def dim_lights_impl ( brightness : float ) - > dict : """Dim the lights. Args: brightness: The brightness of the lights, 0.0 is off, 1.0 is full. Returns: A dictionary containing the new brightness setting. """ return { "brightness" : brightness } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ power_disco_ball_impl , start_music_impl , dim_lights_impl ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Do everything you need to this place into party!" , config = config , ) print ( " \n Example 2: Automatic function calling" ) print ( response . text ) # I've turned on the disco ball, started playing loud and energetic music, and dimmed the lights to 50% brightness. Let's get this party started! Compositional function calling Compositional or sequential function calling allows Gemini to chain multiple function calls together to fulfill a complex request. For example, to answer "Get the temperature in my current location", the Gemini API might first invoke a get_current_location() function followed by a get_weather() function that takes the location as a parameter. The following example demonstrates how to implement compositional function calling using the Python SDK and automatic function calling. Python This example uses the automatic function calling feature of the google-genai Python SDK. The SDK automatically converts the Python functions to the required schema, executes the function calls when requested by the model, and sends the results back to the model to complete the task. import os from google import genai from google.genai import types # Example Functions def get_weather_forecast ( location : str ) - > dict : """Gets the current weather temperature for a given location.""" print ( f "Tool Call: get_weather_forecast(location= { location } )" ) # TODO: Make API call print ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ) return { "temperature" : 25 , "unit" : "celsius" } # Dummy response def set_thermostat_temperature ( temperature : int ) - > dict : """Sets the thermostat to a desired temperature.""" print ( f "Tool Call: set_thermostat_temperature(temperature= { temperature } )" ) # TODO: Interact with a thermostat API print ( "Tool Response: {'status': 'success'}" ) return { "status" : "success" } # Configure the client and model client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_weather_forecast , set_thermostat_temperature ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , config = config , ) # Print the final, user-facing response print ( response . text ) Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . I 've set the thermostat to 20°C. JavaScript This example shows how to use JavaScript/TypeScript SDK to do comopositional function calling using a manual execution loop. import { GoogleGenAI , Type } from "@google/genai" ; // Configure the client const ai = new GoogleGenAI ({}); // Example Functions function get_weather_forecast ({ location }) { console . log ( `Tool Call: get_weather_forecast(location= ${ location } )` ); // TODO: Make API call console . log ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ); return { temperature : 25 , unit : "celsius" }; } function set_thermostat_temperature ({ temperature }) { console . log ( `Tool Call: set_thermostat_temperature(temperature= ${ temperature } )` , ); // TODO: Make API call console . log ( "Tool Response: {'status': 'success'}" ); return { status : "success" }; } const toolFunctions = { get_weather_forecast , set_thermostat_temperature , }; const tools = [ { functionDeclarations : [ { name : "get_weather_forecast" , description : "Gets the current weather temperature for a given location." , parameters : { type : Type . OBJECT , properties : { location : { type : Type . STRING , }, }, required : [ "location" ], }, }, { name : "set_thermostat_temperature" , description : "Sets the thermostat to a desired temperature." , parameters : { type : Type . OBJECT , properties : { temperature : { type : Type . NUMBER , }, }, required : [ "temperature" ], }, }, ], }, ]; // Prompt for the model let contents = [ { role : "user" , parts : [ { text : "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , }, ], }, ]; // Loop until the model has no more function calls to make while ( true ) { const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents , config : { tools }, }); if ( result . functionCalls && result . functionCalls . length > 0 ) { const functionCall = result . functionCalls [ 0 ]; const { name , args } = functionCall ; if ( ! toolFunctions [ name ]) { throw new Error ( `Unknown function call: ${ name } ` ); } // Call the function and get the response. const toolResponse = toolFunctions [ name ]( args ); const functionResponsePart = { name : functionCall . name , response : { result : toolResponse , }, }; // Send the function response back to the model. contents . push ({ role : "model" , parts : [ { functionCall : functionCall , }, ], }); contents . push ({ role : "user" , parts : [ { functionResponse : functionResponsePart , }, ], }); } else { // No more function calls, break the loop. console . log ( result . text ); break ; } } Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . It 's 25°C in London, so I' ve set the thermostat to 20 ° C . Compositional function calling is a native Live API feature. This means Live API can handle the function calling similar to the Python SDK. Python # Light control schemas turn_on_the_lights_schema = { 'name' : 'turn_on_the_lights' } turn_off_the_lights_schema = { 'name' : 'turn_off_the_lights' } prompt = """ Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? """ tools = [ { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} ] await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Light control schemas const turnOnTheLightsSchema = { name : 'turn_on_the_lights' }; const turnOffTheLightsSchema = { name : 'turn_off_the_lights' }; const prompt = ` Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? ` ; const tools = [ { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } ]; await run ( prompt , tools = tools , modality = "AUDIO" ) Function calling modes The Gemini API lets you control how the model uses the provided tools (function declarations). Specifically, you can set the mode within the. function_calling_config . AUTO (Default) : The model decides whether to generate a natural language response or suggest a function call based on the prompt and context. This is the most flexible mode and recommended for most scenarios. ANY : The model is constrained to always predict a function call and guarantees function schema adherence. If allowed_function_names is not specified, the model can choose from any of the provided function declarations. If allowed_function_names is provided as a list, the model can only choose from the functions in that list. Use this mode when you require a function call response to every prompt (if applicable). NONE : The model is prohibited from making function calls. This is equivalent to sending a request without any function declarations. Use this to temporarily disable function calling without removing your tool definitions. Python from google.genai import types # Configure function calling mode tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = "ANY" , allowed_function_names = [ "get_current_temperature" ] ) ) # Create the generation config config = types . GenerateContentConfig ( tools = [ tools ], # not defined here. tool_config = tool_config , ) JavaScript import { FunctionCallingConfigMode } from '@google/genai' ; // Configure function calling mode const toolConfig = { functionCallingConfig : { mode : FunctionCallingConfigMode . ANY , allowedFunctionNames : [ 'get_current_temperature' ] } }; // Create the generation config const config = { tools : tools , // not defined here. toolConfig : toolConfig , }; Automatic function calling (Python only) When using the Python SDK, you can provide Python functions directly as tools. The SDK automatically converts the Python function to declarations, handles the function call execution and the response cycle for you. The Python SDK then automatically: Detects function call responses from the model. Call the corresponding Python function in your code. Sends the function response back to the model. Returns the model's final text response. To use this, define your function with type hints and a docstring, and then pass the function itself (not a JSON declaration) as a tool: Python from google import genai from google.genai import types # Define the function with type hints and docstring def get_current_temperature ( location : str ) - > dict : """Gets the current temperature for a given location. Args: location: The city and state, e.g. San Francisco, CA Returns: A dictionary containing the temperature and unit. """ # ... (implementation) ... return { "temperature" : 25 , "unit" : "Celsius" } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_current_temperature ] ) # Pass the function itself # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "What's the temperature in Boston?" , config = config , ) print ( response . text ) # The SDK handles the function call and returns the final text You can disable automatic function calling with: Python config = types . GenerateContentConfig ( tools = [ get_current_temperature ], automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ) ) Automatic function schema declaration Automatic schema extraction from Python functions doesn't work in all cases. For example, it doesn't handle cases where you describe the fields of a nested dictionary-object. The API is able to describe any of the following types: Python AllowedType = ( int | float | bool | str | list [ 'AllowedType' ] | dict [ str , AllowedType ]) To see what the inferred schema looks like, you can convert it using from_callable : Python def multiply ( a : float , b : float ): """Returns a * b.""" return a * b fn_decl = types . FunctionDeclaration . from_callable ( callable = multiply , client = client ) # to_json_dict() provides a clean JSON representation. print ( fn_decl . to_json_dict ()) Multi-tool use: Combine native tools with function calling You can enable multiple tools combining native tools with function calling at the same time. Here's an example that enables two tools, Grounding with Google Search and code execution , in a request using the Live API . Note: Multi-tool use is a- Live API only feature at the moment. The run() function declaration, which handles the asynchronous websocket setup, is omitted for brevity. Python # Multiple tasks example - combining lights, code execution, and search prompt = """ Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! """ tools = [ { 'google_search' : {}}, { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} # not defined here. ] # Execute the prompt with specified tools in audio modality await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Multiple tasks example - combining lights, code execution, and search const prompt = ` Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! ` ; const tools = [ { googleSearch : {} }, { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } // not defined here. ]; // Execute the prompt with specified tools in audio modality await run ( prompt , { tools : tools , modality : "AUDIO" }); Python developers can try this out in the Live API Tool Use notebook . Model context protocol (MCP) Model Context Protocol (MCP) is an open standard for connecting AI applications with external tools and data. MCP provides a common protocol for models to access context, such as functions (tools), data sources (resources), or predefined prompts. The Gemini SDKs have built-in support for the MCP, reducing boilerplate code and offering automatic tool calling for MCP tools. When the model generates an MCP tool call, the Python and JavaScript client SDK can automatically execute the MCP tool and send the response back to the model in a subsequent request, continuing this loop until no more tool calls are made by the model. Here, you can find an example of how to use a local MCP server with Gemini and mcp SDK. Python Make sure the latest version of the mcp SDK is installed on your platform of choice. pip install mcp Note: Python supports automatic tool calling by passing in the ClientSession into the tools parameters. If you want to disable it, you can provide automatic_function_calling with disabled True . import os import asyncio from datetime import datetime from mcp import ClientSession , StdioServerParameters from mcp.client.stdio import stdio_client from google import genai client = genai . Client () # Create server parameters for stdio connection server_params = StdioServerParameters ( command = "npx" , # Executable args = [ "-y" , "@philschmid/weather-mcp" ], # MCP Server env = None , # Optional environment variables ) async def run (): async with stdio_client ( server_params ) as ( read , write ): async with ClientSession ( read , write ) as session : # Prompt to get the weather for the current day in London. prompt = f "What is the weather in London in { datetime . now () . strftime ( '%Y-%m- %d ' ) } ?" # Initialize the connection between client and server await session . initialize () # Send request to the model with MCP function declarations response = await client . aio . models . generate_content ( model = "gemini-2.5-flash" , contents = prompt , config = genai . types . GenerateContentConfig ( temperature = 0 , tools = [ session ], # uses the session, will automatically call the tool # Uncomment if you **don't** want the SDK to automatically call the tool # automatic_function_calling=genai.types.AutomaticFunctionCallingConfig( # disable=True # ), ), ) print ( response . text ) # Start the asyncio event loop and run the main function asyncio . run ( run ()) JavaScript Make sure the latest version of the mcp SDK is installed on your platform of choice. npm install @modelcontextprotocol/sdk Note: JavaScript supports automatic tool calling by wrapping the client with mcpToTool . If you want to disable it, you can provide automaticFunctionCalling with disabled true . import { GoogleGenAI , FunctionCallingConfigMode , mcpToTool } from '@google/genai' ; import { Client } from "@modelcontextprotocol/sdk/client/index.js" ; import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js" ; // Create server parameters for stdio connection const serverParams = new StdioClientTransport ({ command : "npx" , // Executable args : [ "-y" , "@philschmid/weather-mcp" ] // MCP Server }); const client = new Client ( { name : "example-client" , version : "1.0.0" } ); // Configure the client const ai = new GoogleGenAI ({}); // Initialize the connection between client and server await client . connect ( serverParams ); // Send request to the model with MCP tools const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : `What is the weather in London in ${ new Date (). toLocaleDateString () } ?` , config : { tools : [ mcpToTool ( client )], // uses the session, will automatically call the tool // Uncomment if you **don't** want the sdk to automatically call the tool // automaticFunctionCalling: { // disable: true, // }, }, }); console . log ( response . text ) // Close the connection await client . close (); Limitations with built-in MCP support Built-in MCP support is a experimental feature in our SDKs and has the following limitations: Only tools are supported, not resources nor prompts It is available for the Python and JavaScript/TypeScript SDK. Breaking changes might occur in future releases. Manual integration of MCP servers is always an option if these limit what you're building. Supported models This section lists models and their function calling capabilities. Experimental models are not included. You can find a comprehensive capabilities overview on the model overview page. Model Function Calling Parallel Function Calling Compositional Function Calling Gemini 2.5 Pro ✔️ ✔️ ✔️ Gemini 2.5 Flash ✔️ ✔️ ✔️ Gemini 2.5 Flash-Lite ✔️ ✔️ ✔️ Gemini 2.0 Flash ✔️ ✔️ ✔️ Gemini 2.0 Flash-Lite X X X Best practices Function and Parameter Descriptions: Be extremely clear and specific in your descriptions. The model relies on these to choose the correct function and provide appropriate arguments. Naming: Use descriptive function names (without spaces, periods, or dashes). Strong Typing: Use specific types (integer, string, enum) for parameters to reduce errors. If a parameter has a limited set of valid values, use an enum. Tool Selection: While the model can use an arbitrary number of tools, providing too many can increase the risk of selecting an incorrect or suboptimal tool. For best results, aim to provide only the relevant tools for the context or task, ideally keeping the active set to a maximum of 10-20. Consider dynamic tool selection based on conversation context if you have a large total number of tools. Prompt Engineering: Provide context: Tell the model its role (e.g., "You are a helpful weather assistant."). Give instructions: Specify how and when to use functions (e.g., "Don't guess dates; always use a future date for forecasts."). Encourage clarification: Instruct the model to ask clarifying questions if needed. Temperature: Use a low temperature (e.g., 0) for more deterministic and reliable function calls. Validation: If a function call has significant consequences (e.g., placing an order), validate the call with the user before executing it. Error Handling : Implement robust error handling in your functions to gracefully handle unexpected inputs or API failures. Return informative error messages that the model can use to generate helpful responses to the user. Security: Be mindful of security when calling external APIs. Use appropriate authentication and authorization mechanisms. Avoid exposing sensitive data in function calls. Token Limits: Function descriptions and parameters count towards your input token limit. If you're hitting token limits, consider limiting the number of functions or the length of the descriptions, break down complex tasks into smaller, more focused function sets. Notes and limitations Only a subset of the OpenAPI schema is supported. Supported parameter types in Python are limited. Automatic function calling is a Python SDK feature only. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-10 UTC.
|
text_content/docs_function-calling_dbd56c28.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/function-calling#parallel_function_calling
|
| 2 |
+
Title: Function calling with the Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Function calling with the Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Function calling with the Gemini API Function calling lets you connect models to external tools and APIs. Instead of generating text responses, the model determines when to call specific functions and provides the necessary parameters to execute real-world actions. This allows the model to act as a bridge between natural language and real-world actions and data. Function calling has 3 primary use cases: Augment Knowledge: Access information from external sources like databases, APIs, and knowledge bases. Extend Capabilities: Use external tools to perform computations and extend the limitations of the model, such as using a calculator or creating charts. Take Actions: Interact with external systems using APIs, such as scheduling appointments, creating invoices, sending emails, or controlling smart home devices. Get Weather Schedule Meeting Create Chart How function calling works Function calling involves a structured interaction between your application, the model, and external functions. Here's a breakdown of the process: Define Function Declaration: Define the function declaration in your application code. Function Declarations describe the function's name, parameters, and purpose to the model. Call LLM with function declarations: Send user prompt along with the function declaration(s) to the model. It analyzes the request and determines if a function call would be helpful. If so, it responds with a structured JSON object. Execute Function Code (Your Responsibility): The Model does not execute the function itself. It's your application's responsibility to process the response and check for Function Call, if Yes : Extract the name and args of the function and execute the corresponding function in your application. No: The model has provided a direct text response to the prompt (this flow is less emphasized in the example but is a possible outcome). Create User friendly response: If a function was executed, capture the result and send it back to the model in a subsequent turn of the conversation. It will use the result to generate a final, user-friendly response that incorporates the information from the function call. This process can be repeated over multiple turns, allowing for complex interactions and workflows. The model also supports calling multiple functions in a single turn ( parallel function calling ) and in sequence ( compositional function calling ). Step 1: Define a function declaration Define a function and its declaration within your application code that allows users to set light values and make an API request. This function could call external services or APIs. Python # Define a function that the model can call to control smart lights set_light_values_declaration = { "name" : "set_light_values" , "description" : "Sets the brightness and color temperature of a light." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "integer" , "description" : "Light level from 0 to 100. Zero is off and 100 is full brightness" , }, "color_temp" : { "type" : "string" , "enum" : [ "daylight" , "cool" , "warm" ], "description" : "Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`." , }, }, "required" : [ "brightness" , "color_temp" ], }, } # This is the actual function that would be called based on the model's suggestion def set_light_values ( brightness : int , color_temp : str ) - > dict [ str , int | str ]: """Set the brightness and color temperature of a room light. (mock API). Args: brightness: Light level from 0 to 100. Zero is off and 100 is full brightness color_temp: Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. Returns: A dictionary containing the set brightness and color temperature. """ return { "brightness" : brightness , "colorTemperature" : color_temp } JavaScript import { Type } from '@google/genai' ; // Define a function that the model can call to control smart lights const setLightValuesFunctionDeclaration = { name : 'set_light_values' , description : 'Sets the brightness and color temperature of a light.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'Light level from 0 to 100. Zero is off and 100 is full brightness' , }, color_temp : { type : Type . STRING , enum : [ 'daylight' , 'cool' , 'warm' ], description : 'Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`.' , }, }, required : [ 'brightness' , 'color_temp' ], }, }; /** * Set the brightness and color temperature of a room light. (mock API) * @param {number} brightness - Light level from 0 to 100. Zero is off and 100 is full brightness * @param {string} color_temp - Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. * @return {Object} A dictionary containing the set brightness and color temperature. */ function setLightValues ( brightness , color_temp ) { return { brightness : brightness , colorTemperature : color_temp }; } Step 2: Call the model with function declarations Once you have defined your function declarations, you can prompt the model to use them. It analyzes the prompt and function declarations and decides whether to respond directly or to call a function. If a function is called, the response object will contain a function call suggestion. Python from google.genai import types # Configure the client and tools client = genai . Client () tools = types . Tool ( function_declarations = [ set_light_values_declaration ]) config = types . GenerateContentConfig ( tools = [ tools ]) # Define user prompt contents = [ types . Content ( role = "user" , parts = [ types . Part ( text = "Turn the lights down to a romantic level" )] ) ] # Send request with function declarations response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = contents config = config , ) print ( response . candidates [ 0 ] . content . parts [ 0 ] . function_call ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Generation config with function declaration const config = { tools : [{ functionDeclarations : [ setLightValuesFunctionDeclaration ] }] }; // Configure the client const ai = new GoogleGenAI ({}); // Define user prompt const contents = [ { role : 'user' , parts : [{ text : 'Turn the lights down to a romantic level' }] } ]; // Send request with function declarations const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( response . functionCalls [ 0 ]); The model then returns a functionCall object in an OpenAPI compatible schema specifying how to call one or more of the declared functions in order to respond to the user's question. Python id = None args = { 'color_temp' : 'warm' , 'brightness' : 25 } name = 'set_light_values' JavaScript { name : 'set_light_values' , args : { brightness : 25 , color_temp : 'warm' } } Step 3: Execute set_light_values function code Extract the function call details from the model's response, parse the arguments , and execute the set_light_values function. Python # Extract tool call details, it may not be in the first part. tool_call = response . candidates [ 0 ] . content . parts [ 0 ] . function_call if tool_call . name == "set_light_values" : result = set_light_values ( ** tool_call . args ) print ( f "Function execution result: { result } " ) JavaScript // Extract tool call details const tool_call = response . functionCalls [ 0 ] let result ; if ( tool_call . name === 'set_light_values' ) { result = setLightValues ( tool_call . args . brightness , tool_call . args . color_temp ); console . log ( `Function execution result: ${ JSON . stringify ( result ) } ` ); } Step 4: Create user friendly response with function result and call the model again Finally, send the result of the function execution back to the model so it can incorporate this information into its final response to the user. Python # Create a function response part function_response_part = types . Part . from_function_response ( name = tool_call . name , response = { "result" : result }, ) # Append function call and result of the function execution to contents contents . append ( response . candidates [ 0 ] . content ) # Append the content from the model's response. contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Create a function response part const function_response_part = { name : tool_call . name , response : { result } } // Append function call and result of the function execution to contents contents . push ( response . candidates [ 0 ]. content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); // Get the final response from the model const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); This completes the function calling flow. The model successfully used the set_light_values function to perform the request action of the user. Function declarations When you implement function calling in a prompt, you create a tools object, which contains one or more function declarations . You define functions using JSON, specifically with a select subset of the OpenAPI schema format. A single function declaration can include the following parameters: name (string): A unique name for the function ( get_weather_forecast , send_email ). Use descriptive names without spaces or special characters (use underscores or camelCase). description (string): A clear and detailed explanation of the function's purpose and capabilities. This is crucial for the model to understand when to use the function. Be specific and provide examples if helpful ("Finds theaters based on location and optionally movie title which is currently playing in theaters."). parameters (object): Defines the input parameters the function expects. type (string): Specifies the overall data type, such as object . properties (object): Lists individual parameters, each with: type (string): The data type of the parameter, such as string , integer , boolean, array . description (string): A description of the parameter's purpose and format. Provide examples and constraints ("The city and state, e.g., 'San Francisco, CA' or a zip code e.g., '95616'."). enum (array, optional): If the parameter values are from a fixed set, use "enum" to list the allowed values instead of just describing them in the description. This improves accuracy ("enum": ["daylight", "cool", "warm"]). required (array): An array of strings listing the parameter names that are mandatory for the function to operate. Function calling with thinking Enabling "thinking" can improve function call performance by allowing the model to reason through a request before suggesting function calls. However, because the Gemini API is stateless, this reasoning context is lost between turns, which can reduce the quality of function calls as they require multiple turn requests. To preserve this context you can use thought signatures. A thought signature is an encrypted representation of the model's internal thought process that you pass back to the model on subsequent turns. To use thought signatures: Receive the signature: When thinking is enabled, the API response will include a thought_signature field containing an encrypted representation of the model's reasoning. Return the signature: When you send the function's execution result back to the server, include the thought_signature you received. This allows the model to restore its previous thinking context and will likely result in better function calling performance. Receiving signatures from the server Signatures are returned in the part after the model's thinking phase, which typically is a text or function call. Here are some examples of what thought signatures look like returned in each type of part, in response to the request "What's the weather in Lake Tahoe?" using the Get Weather example: Text part [{ "candidates" : [ { "content" : { "parts" : [ { "text" : "Here's what the weather in Lake Tahoe is today" , "thoughtSignature" : "ClcBVKhc7ru7KzUI7SrdUoIdAYLm/+i93aHjfIt4xHyAoO/G70tApxnK2ujBhOhC1PrRy1pkQa88fqFvpHNVd1HDjNLO7mkp6/hFwE+SPPEB3fh0hs4oM8MKhgIBVKhc7uIGvrS7i/T4HpfbnYrluFfWNjZ62gewqe4cVdR/Dlh+zbjtYmDD0gPZ+SuBO7vvHQdzsjePRP+2Y5XddX6LEf/cGGgakq8EhVvw/a6IVzUO6XmpHg2Ag1sl8E9+VFH/lC0R0ZuYdFWligtDuYwp5p5q3o59G0TtWeU2MC1y2MJfE9u/KWd313ldka80/X2W/xF2O/4djMp5G2WKcULfve75zeRCy0mc5iS3SB9mTH0cT6x0vtKjeBx50gcg+CQWtJcRuwTVzz54dmvmK9xvnqA8gKGw3DuaM9wfy5hyY7Qg0z3iyyWdP8T/lbjKim8IEQOk7O1vVwP1Ko7oMYH8JgA1CsoBAVSoXO6v4c5RSyd1cn6EIU0pEFQsjW7rYWPuZdOFq/tsGJT9BCfW7KGkPGwlNSq8jTJFvbcJ/DjtndISQYXwiXd2kGa5JfdS2Kh4zOxCxiWtOk+2nCc3+XQk2nonhO+esGJpkDdbbHZSqRgcUtYKq7q28iPFOQvOFyCiZNB7K86Z/6Hnagu2snSlN/BcTMaFGaWpcCClSUo4foRZn3WbNCoM8rcpD7qEJMp4a5baaSxyyeL1ZTGd2HLpFys/oiW6e3oAnhxuIysCwg==" } ] , "role" : "model" } , "index" : 0 } ] , # Remainder of response... Function call part [{ "candidates" : [ { "content" : { "parts" : [ { "functionCall" : { "name" : "getWeather" , "args" : { "city" : "Lake Tahoe" } } , "thoughtSignature" : "CiwBVKhc7nRyTi3HmggPD9iQiRc261f5jwuMdw3H/itDH0emsb9ZVo3Nwx9p6wpsAVSoXO5i8fDV4jBSBLoaWxB5zUdlGY6aIGp+I0oEnwRRSRQ1LOvrDlojEH8JE8HjiKXALdJrvNPiG+HY3GZEO8pZjEZtc3UoBUh7+SVyjK7Xolu7aRYYeUyzrCapoETWypER1jbrJXnFV23hCosBAVSoXO6oIPNJSmbuEDfGafOhuCSHkpr1yjTp35RXYqmCESzRzWf5+nFXLqncqeFo4ohoxbiYQVpVQbOZF81p8o9zg6xeRE7qMeOv+XN7enXGJ4/s3qNFQpfkSMqRdBITN1VpX7jyfEAjvxBNc7PDfDJZmEPY338ZIY5nFFcmzJSWjVrboFt2sMFv+A==" } ] , "role" : "model" } , "finishReason" : "STOP" , "index" : 0 } ] , # Remainder of response... You can confirm that you received a signature and see what a signature looks like using the following code: # Step 2: Call the model with function declarations # ...Generation config, Configure the client, and Define user prompt (No changes) # Send request with declarations (using a thinking model) response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents ) # See thought signatures for part in response . candidates [ 0 ] . content . parts : if part . thought_signature : print ( "Thought signature:" ) print ( part . thought_signature ) Returning signatures back to the server In order to return signatures back: You should return signatures along with their containing parts back to the server You shouldn't merge a part with a signature with another part which also contains a signature. The signature string is not concatenable You shouldn't merge one part with a signature with another part without a signature. This breaks the correct positioning of the thought represented by the signature. The code will remain the same as in Step 4 of the previous section. But in this case (as indicated in the comment below) you will return signatures to the model along with the result of the function execution so the model can incorporate the thoughts into its final response: Python # Step 4: Create user friendly response with function result and call the model again # ...Create a function response part (No change) # Append thought signatures, function call and result of the function execution to contents function_call_content = response . candidates [ 0 ] . content # Append the model's function call message, which includes thought signatures contents . append ( function_call_content ) contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Step 4: Create user friendly response with function result and call the model again // ...Create a function response part (No change) // Append thought signatures, function call and result of the function execution to contents const function_response_content = response . candidates [ 0 ]. content ; contents . push ( function_response_content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); The following shows what a request returning a thought signature may look like: [{ "contents" : [ { "role" : "user" , "parts" : [ { "text" : "what is the weather in Lake Tahoe?" } ] } , { "parts" : [ { "functionCall" : { "name" : "getWeather" , "args" : { "city" : "Lake Tahoe" } } , "thoughtSignature" : "CiIBVKhc7oDPpCaXyJKKssjqr4g3JNOSgJ/M2V+1THC1icsWCmwBVKhc7pBABbZ+zR3e9234WnWWS6GFXmf8IVwpnzjd5KYd7vyJbn/4vTorWBGayj/vbd9JPaZQjxdAIXhoE5mX/MDsQ7M9N/b0qJjHm39tYIBvS4sIWkMDHqTJqXGLzhhKtrTkfbV3RbaJEkQKmwEBVKhc7qVUgC3hfTXZLo9R3AJzUUIx50NKvJTb9B+UU+LBqgg7Nck1x5OpjWVS2R+SsveprIuYOruk2Y0H53J2OJF8qsxTdIq2si8DGW2V7WK8xyoJH5kbqd7drIw1jLb44b6lx4SMyB0VaULuTBki4d+Ljjg1tJTwR0IYMKqDLDZt9mheINsi0ZxcNjfpnDydRXdWbcSwzmK/wgqJAQFUqFzuKgNVElxs3cbO+xebr2IwcOro84nKTisi0tTp9bICPC9fTUhn3L+rvQWA+d3J1Za8at2bakrqiRj7BTh+CVO9fWQMAEQAs3ni0Z2hfaYG92tOD26E4IoZwyYEoWbfNudpH1fr5tEkyqnEGtWIh7H+XoZQ2DXeiOa+br7Zk88SrNE+trJMCogBAVSoXO5e9fBLg7hnbkmKsrzNLnQtLsQm1gNzjcjEC7nJYklYPp0KI2uGBE1PkM8XNsfllAfHVn7LzHcHNlbQ9pJ7QZTSIeG42goS971r5wNZwxaXwCTphClQh826eqJWo6A/28TtAVQWLhTx5ekbP7qb4nh1UblESZ1saxDQAEo4OKPbDzx5BgqKAQFUqFzuVyjNm5i0wN8hTDnKjfpDroEpPPTs531iFy9BOX+xDCdGHy8D+osFpaoBq6TFekQQbz4hIoUR1YEcP4zI80/cNimEeb9IcFxZTTxiNrbhbbcv0969DSMWhB+ZEqIz4vuw4GLe/xcUvqhlChQwFdgIbdOQHSHpatn5uDlktnP/bi26nKuXIwo0AVSoXO7US22OUH7d1f4abNPI0IyAvhqkPp12rbtWLx9vkOtojE8IP+xCfYtIFuZIzRNZqA==" } ] , "role" : "model" } , { "role" : "user" , "parts" : [ { "functionResponse" : { "name" : "getWeather" , "response" : { "response" : { "stringValue" : "Sunny and hot. 90 degrees Fahrenheit" } } } } ] } ] , # Remainder of request... Learn more about limitations and usage of thought signatures, and about thinking models in general, on the Thinking page. Parallel function calling In addition to single turn function calling, you can also call multiple functions at once. Parallel function calling lets you execute multiple functions at once and is used when the functions are not dependent on each other. This is useful in scenarios like gathering data from multiple independent sources, such as retrieving customer details from different databases or checking inventory levels across various warehouses or performing multiple actions such as converting your apartment into a disco. Python power_disco_ball = { "name" : "power_disco_ball" , "description" : "Powers the spinning disco ball." , "parameters" : { "type" : "object" , "properties" : { "power" : { "type" : "boolean" , "description" : "Whether to turn the disco ball on or off." , } }, "required" : [ "power" ], }, } start_music = { "name" : "start_music" , "description" : "Play some music matching the specified parameters." , "parameters" : { "type" : "object" , "properties" : { "energetic" : { "type" : "boolean" , "description" : "Whether the music is energetic or not." , }, "loud" : { "type" : "boolean" , "description" : "Whether the music is loud or not." , }, }, "required" : [ "energetic" , "loud" ], }, } dim_lights = { "name" : "dim_lights" , "description" : "Dim the lights." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "number" , "description" : "The brightness of the lights, 0.0 is off, 1.0 is full." , } }, "required" : [ "brightness" ], }, } JavaScript import { Type } from '@google/genai' ; const powerDiscoBall = { name : 'power_disco_ball' , description : 'Powers the spinning disco ball.' , parameters : { type : Type . OBJECT , properties : { power : { type : Type . BOOLEAN , description : 'Whether to turn the disco ball on or off.' } }, required : [ 'power' ] } }; const startMusic = { name : 'start_music' , description : 'Play some music matching the specified parameters.' , parameters : { type : Type . OBJECT , properties : { energetic : { type : Type . BOOLEAN , description : 'Whether the music is energetic or not.' }, loud : { type : Type . BOOLEAN , description : 'Whether the music is loud or not.' } }, required : [ 'energetic' , 'loud' ] } }; const dimLights = { name : 'dim_lights' , description : 'Dim the lights.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'The brightness of the lights, 0.0 is off, 1.0 is full.' } }, required : [ 'brightness' ] } }; Configure the function calling mode to allow using all of the specified tools. To learn more, you can read about configuring function calling . Python from google import genai from google.genai import types # Configure the client and tools client = genai . Client () house_tools = [ types . Tool ( function_declarations = [ power_disco_ball , start_music , dim_lights ]) ] config = types . GenerateContentConfig ( tools = house_tools , automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ), # Force the model to call 'any' function, instead of chatting. tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = 'ANY' ) ), ) chat = client . chats . create ( model = "gemini-2.5-flash" , config = config ) response = chat . send_message ( "Turn this place into a party!" ) # Print out each of the function calls requested from this single call print ( "Example 1: Forced function calling" ) for fn in response . function_calls : args = ", " . join ( f " { key } = { val } " for key , val in fn . args . items ()) print ( f " { fn . name } ( { args } )" ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Set up function declarations const houseFns = [ powerDiscoBall , startMusic , dimLights ]; const config = { tools : [{ functionDeclarations : houseFns }], // Force the model to call 'any' function, instead of chatting. toolConfig : { functionCallingConfig : { mode : 'any' } } }; // Configure the client const ai = new GoogleGenAI ({}); // Create a chat session const chat = ai . chats . create ({ model : 'gemini-2.5-flash' , config : config }); const response = await chat . sendMessage ({ message : 'Turn this place into a party!' }); // Print out each of the function calls requested from this single call console . log ( "Example 1: Forced function calling" ); for ( const fn of response . functionCalls ) { const args = Object . entries ( fn . args ) . map (([ key , val ]) = > ` ${ key } = ${ val } ` ) . join ( ', ' ); console . log ( ` ${ fn . name } ( ${ args } )` ); } Each of the printed results reflects a single function call that the model has requested. To send the results back, include the responses in the same order as they were requested. The Python SDK supports automatic function calling , which automatically converts Python functions to declarations, handles the function call execution and response cycle for you. Following is an example for the disco use case. Note: Automatic Function Calling is a Python SDK only feature at the moment. Python from google import genai from google.genai import types # Actual function implementations def power_disco_ball_impl ( power : bool ) - > dict : """Powers the spinning disco ball. Args: power: Whether to turn the disco ball on or off. Returns: A status dictionary indicating the current state. """ return { "status" : f "Disco ball powered { 'on' if power else 'off' } " } def start_music_impl ( energetic : bool , loud : bool ) - > dict : """Play some music matching the specified parameters. Args: energetic: Whether the music is energetic or not. loud: Whether the music is loud or not. Returns: A dictionary containing the music settings. """ music_type = "energetic" if energetic else "chill" volume = "loud" if loud else "quiet" return { "music_type" : music_type , "volume" : volume } def dim_lights_impl ( brightness : float ) - > dict : """Dim the lights. Args: brightness: The brightness of the lights, 0.0 is off, 1.0 is full. Returns: A dictionary containing the new brightness setting. """ return { "brightness" : brightness } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ power_disco_ball_impl , start_music_impl , dim_lights_impl ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Do everything you need to this place into party!" , config = config , ) print ( " \n Example 2: Automatic function calling" ) print ( response . text ) # I've turned on the disco ball, started playing loud and energetic music, and dimmed the lights to 50% brightness. Let's get this party started! Compositional function calling Compositional or sequential function calling allows Gemini to chain multiple function calls together to fulfill a complex request. For example, to answer "Get the temperature in my current location", the Gemini API might first invoke a get_current_location() function followed by a get_weather() function that takes the location as a parameter. The following example demonstrates how to implement compositional function calling using the Python SDK and automatic function calling. Python This example uses the automatic function calling feature of the google-genai Python SDK. The SDK automatically converts the Python functions to the required schema, executes the function calls when requested by the model, and sends the results back to the model to complete the task. import os from google import genai from google.genai import types # Example Functions def get_weather_forecast ( location : str ) - > dict : """Gets the current weather temperature for a given location.""" print ( f "Tool Call: get_weather_forecast(location= { location } )" ) # TODO: Make API call print ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ) return { "temperature" : 25 , "unit" : "celsius" } # Dummy response def set_thermostat_temperature ( temperature : int ) - > dict : """Sets the thermostat to a desired temperature.""" print ( f "Tool Call: set_thermostat_temperature(temperature= { temperature } )" ) # TODO: Interact with a thermostat API print ( "Tool Response: {'status': 'success'}" ) return { "status" : "success" } # Configure the client and model client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_weather_forecast , set_thermostat_temperature ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , config = config , ) # Print the final, user-facing response print ( response . text ) Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . I 've set the thermostat to 20°C. JavaScript This example shows how to use JavaScript/TypeScript SDK to do comopositional function calling using a manual execution loop. import { GoogleGenAI , Type } from "@google/genai" ; // Configure the client const ai = new GoogleGenAI ({}); // Example Functions function get_weather_forecast ({ location }) { console . log ( `Tool Call: get_weather_forecast(location= ${ location } )` ); // TODO: Make API call console . log ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ); return { temperature : 25 , unit : "celsius" }; } function set_thermostat_temperature ({ temperature }) { console . log ( `Tool Call: set_thermostat_temperature(temperature= ${ temperature } )` , ); // TODO: Make API call console . log ( "Tool Response: {'status': 'success'}" ); return { status : "success" }; } const toolFunctions = { get_weather_forecast , set_thermostat_temperature , }; const tools = [ { functionDeclarations : [ { name : "get_weather_forecast" , description : "Gets the current weather temperature for a given location." , parameters : { type : Type . OBJECT , properties : { location : { type : Type . STRING , }, }, required : [ "location" ], }, }, { name : "set_thermostat_temperature" , description : "Sets the thermostat to a desired temperature." , parameters : { type : Type . OBJECT , properties : { temperature : { type : Type . NUMBER , }, }, required : [ "temperature" ], }, }, ], }, ]; // Prompt for the model let contents = [ { role : "user" , parts : [ { text : "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , }, ], }, ]; // Loop until the model has no more function calls to make while ( true ) { const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents , config : { tools }, }); if ( result . functionCalls && result . functionCalls . length > 0 ) { const functionCall = result . functionCalls [ 0 ]; const { name , args } = functionCall ; if ( ! toolFunctions [ name ]) { throw new Error ( `Unknown function call: ${ name } ` ); } // Call the function and get the response. const toolResponse = toolFunctions [ name ]( args ); const functionResponsePart = { name : functionCall . name , response : { result : toolResponse , }, }; // Send the function response back to the model. contents . push ({ role : "model" , parts : [ { functionCall : functionCall , }, ], }); contents . push ({ role : "user" , parts : [ { functionResponse : functionResponsePart , }, ], }); } else { // No more function calls, break the loop. console . log ( result . text ); break ; } } Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . It 's 25°C in London, so I' ve set the thermostat to 20 ° C . Compositional function calling is a native Live API feature. This means Live API can handle the function calling similar to the Python SDK. Python # Light control schemas turn_on_the_lights_schema = { 'name' : 'turn_on_the_lights' } turn_off_the_lights_schema = { 'name' : 'turn_off_the_lights' } prompt = """ Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? """ tools = [ { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} ] await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Light control schemas const turnOnTheLightsSchema = { name : 'turn_on_the_lights' }; const turnOffTheLightsSchema = { name : 'turn_off_the_lights' }; const prompt = ` Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? ` ; const tools = [ { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } ]; await run ( prompt , tools = tools , modality = "AUDIO" ) Function calling modes The Gemini API lets you control how the model uses the provided tools (function declarations). Specifically, you can set the mode within the. function_calling_config . AUTO (Default) : The model decides whether to generate a natural language response or suggest a function call based on the prompt and context. This is the most flexible mode and recommended for most scenarios. ANY : The model is constrained to always predict a function call and guarantees function schema adherence. If allowed_function_names is not specified, the model can choose from any of the provided function declarations. If allowed_function_names is provided as a list, the model can only choose from the functions in that list. Use this mode when you require a function call response to every prompt (if applicable). NONE : The model is prohibited from making function calls. This is equivalent to sending a request without any function declarations. Use this to temporarily disable function calling without removing your tool definitions. Python from google.genai import types # Configure function calling mode tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = "ANY" , allowed_function_names = [ "get_current_temperature" ] ) ) # Create the generation config config = types . GenerateContentConfig ( tools = [ tools ], # not defined here. tool_config = tool_config , ) JavaScript import { FunctionCallingConfigMode } from '@google/genai' ; // Configure function calling mode const toolConfig = { functionCallingConfig : { mode : FunctionCallingConfigMode . ANY , allowedFunctionNames : [ 'get_current_temperature' ] } }; // Create the generation config const config = { tools : tools , // not defined here. toolConfig : toolConfig , }; Automatic function calling (Python only) When using the Python SDK, you can provide Python functions directly as tools. The SDK automatically converts the Python function to declarations, handles the function call execution and the response cycle for you. The Python SDK then automatically: Detects function call responses from the model. Call the corresponding Python function in your code. Sends the function response back to the model. Returns the model's final text response. To use this, define your function with type hints and a docstring, and then pass the function itself (not a JSON declaration) as a tool: Python from google import genai from google.genai import types # Define the function with type hints and docstring def get_current_temperature ( location : str ) - > dict : """Gets the current temperature for a given location. Args: location: The city and state, e.g. San Francisco, CA Returns: A dictionary containing the temperature and unit. """ # ... (implementation) ... return { "temperature" : 25 , "unit" : "Celsius" } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_current_temperature ] ) # Pass the function itself # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "What's the temperature in Boston?" , config = config , ) print ( response . text ) # The SDK handles the function call and returns the final text You can disable automatic function calling with: Python config = types . GenerateContentConfig ( tools = [ get_current_temperature ], automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ) ) Automatic function schema declaration Automatic schema extraction from Python functions doesn't work in all cases. For example, it doesn't handle cases where you describe the fields of a nested dictionary-object. The API is able to describe any of the following types: Python AllowedType = ( int | float | bool | str | list [ 'AllowedType' ] | dict [ str , AllowedType ]) To see what the inferred schema looks like, you can convert it using from_callable : Python def multiply ( a : float , b : float ): """Returns a * b.""" return a * b fn_decl = types . FunctionDeclaration . from_callable ( callable = multiply , client = client ) # to_json_dict() provides a clean JSON representation. print ( fn_decl . to_json_dict ()) Multi-tool use: Combine native tools with function calling You can enable multiple tools combining native tools with function calling at the same time. Here's an example that enables two tools, Grounding with Google Search and code execution , in a request using the Live API . Note: Multi-tool use is a- Live API only feature at the moment. The run() function declaration, which handles the asynchronous websocket setup, is omitted for brevity. Python # Multiple tasks example - combining lights, code execution, and search prompt = """ Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! """ tools = [ { 'google_search' : {}}, { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} # not defined here. ] # Execute the prompt with specified tools in audio modality await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Multiple tasks example - combining lights, code execution, and search const prompt = ` Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! ` ; const tools = [ { googleSearch : {} }, { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } // not defined here. ]; // Execute the prompt with specified tools in audio modality await run ( prompt , { tools : tools , modality : "AUDIO" }); Python developers can try this out in the Live API Tool Use notebook . Model context protocol (MCP) Model Context Protocol (MCP) is an open standard for connecting AI applications with external tools and data. MCP provides a common protocol for models to access context, such as functions (tools), data sources (resources), or predefined prompts. The Gemini SDKs have built-in support for the MCP, reducing boilerplate code and offering automatic tool calling for MCP tools. When the model generates an MCP tool call, the Python and JavaScript client SDK can automatically execute the MCP tool and send the response back to the model in a subsequent request, continuing this loop until no more tool calls are made by the model. Here, you can find an example of how to use a local MCP server with Gemini and mcp SDK. Python Make sure the latest version of the mcp SDK is installed on your platform of choice. pip install mcp Note: Python supports automatic tool calling by passing in the ClientSession into the tools parameters. If you want to disable it, you can provide automatic_function_calling with disabled True . import os import asyncio from datetime import datetime from mcp import ClientSession , StdioServerParameters from mcp.client.stdio import stdio_client from google import genai client = genai . Client () # Create server parameters for stdio connection server_params = StdioServerParameters ( command = "npx" , # Executable args = [ "-y" , "@philschmid/weather-mcp" ], # MCP Server env = None , # Optional environment variables ) async def run (): async with stdio_client ( server_params ) as ( read , write ): async with ClientSession ( read , write ) as session : # Prompt to get the weather for the current day in London. prompt = f "What is the weather in London in { datetime . now () . strftime ( '%Y-%m- %d ' ) } ?" # Initialize the connection between client and server await session . initialize () # Send request to the model with MCP function declarations response = await client . aio . models . generate_content ( model = "gemini-2.5-flash" , contents = prompt , config = genai . types . GenerateContentConfig ( temperature = 0 , tools = [ session ], # uses the session, will automatically call the tool # Uncomment if you **don't** want the SDK to automatically call the tool # automatic_function_calling=genai.types.AutomaticFunctionCallingConfig( # disable=True # ), ), ) print ( response . text ) # Start the asyncio event loop and run the main function asyncio . run ( run ()) JavaScript Make sure the latest version of the mcp SDK is installed on your platform of choice. npm install @modelcontextprotocol/sdk Note: JavaScript supports automatic tool calling by wrapping the client with mcpToTool . If you want to disable it, you can provide automaticFunctionCalling with disabled true . import { GoogleGenAI , FunctionCallingConfigMode , mcpToTool } from '@google/genai' ; import { Client } from "@modelcontextprotocol/sdk/client/index.js" ; import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js" ; // Create server parameters for stdio connection const serverParams = new StdioClientTransport ({ command : "npx" , // Executable args : [ "-y" , "@philschmid/weather-mcp" ] // MCP Server }); const client = new Client ( { name : "example-client" , version : "1.0.0" } ); // Configure the client const ai = new GoogleGenAI ({}); // Initialize the connection between client and server await client . connect ( serverParams ); // Send request to the model with MCP tools const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : `What is the weather in London in ${ new Date (). toLocaleDateString () } ?` , config : { tools : [ mcpToTool ( client )], // uses the session, will automatically call the tool // Uncomment if you **don't** want the sdk to automatically call the tool // automaticFunctionCalling: { // disable: true, // }, }, }); console . log ( response . text ) // Close the connection await client . close (); Limitations with built-in MCP support Built-in MCP support is a experimental feature in our SDKs and has the following limitations: Only tools are supported, not resources nor prompts It is available for the Python and JavaScript/TypeScript SDK. Breaking changes might occur in future releases. Manual integration of MCP servers is always an option if these limit what you're building. Supported models This section lists models and their function calling capabilities. Experimental models are not included. You can find a comprehensive capabilities overview on the model overview page. Model Function Calling Parallel Function Calling Compositional Function Calling Gemini 2.5 Pro ✔️ ✔️ ✔️ Gemini 2.5 Flash ✔️ ✔️ ✔️ Gemini 2.5 Flash-Lite ✔️ ✔️ ✔️ Gemini 2.0 Flash ✔️ ✔️ ✔️ Gemini 2.0 Flash-Lite X X X Best practices Function and Parameter Descriptions: Be extremely clear and specific in your descriptions. The model relies on these to choose the correct function and provide appropriate arguments. Naming: Use descriptive function names (without spaces, periods, or dashes). Strong Typing: Use specific types (integer, string, enum) for parameters to reduce errors. If a parameter has a limited set of valid values, use an enum. Tool Selection: While the model can use an arbitrary number of tools, providing too many can increase the risk of selecting an incorrect or suboptimal tool. For best results, aim to provide only the relevant tools for the context or task, ideally keeping the active set to a maximum of 10-20. Consider dynamic tool selection based on conversation context if you have a large total number of tools. Prompt Engineering: Provide context: Tell the model its role (e.g., "You are a helpful weather assistant."). Give instructions: Specify how and when to use functions (e.g., "Don't guess dates; always use a future date for forecasts."). Encourage clarification: Instruct the model to ask clarifying questions if needed. Temperature: Use a low temperature (e.g., 0) for more deterministic and reliable function calls. Validation: If a function call has significant consequences (e.g., placing an order), validate the call with the user before executing it. Error Handling : Implement robust error handling in your functions to gracefully handle unexpected inputs or API failures. Return informative error messages that the model can use to generate helpful responses to the user. Security: Be mindful of security when calling external APIs. Use appropriate authentication and authorization mechanisms. Avoid exposing sensitive data in function calls. Token Limits: Function descriptions and parameters count towards your input token limit. If you're hitting token limits, consider limiting the number of functions or the length of the descriptions, break down complex tasks into smaller, more focused function sets. Notes and limitations Only a subset of the OpenAPI schema is supported. Supported parameter types in Python are limited. Automatic function calling is a Python SDK feature only. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-10 UTC.
|
text_content/docs_function-calling_f7913038.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/function-calling#multi-tool-use
|
| 2 |
+
Title: Function calling with the Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Function calling with the Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Function calling with the Gemini API Function calling lets you connect models to external tools and APIs. Instead of generating text responses, the model determines when to call specific functions and provides the necessary parameters to execute real-world actions. This allows the model to act as a bridge between natural language and real-world actions and data. Function calling has 3 primary use cases: Augment Knowledge: Access information from external sources like databases, APIs, and knowledge bases. Extend Capabilities: Use external tools to perform computations and extend the limitations of the model, such as using a calculator or creating charts. Take Actions: Interact with external systems using APIs, such as scheduling appointments, creating invoices, sending emails, or controlling smart home devices. Get Weather Schedule Meeting Create Chart How function calling works Function calling involves a structured interaction between your application, the model, and external functions. Here's a breakdown of the process: Define Function Declaration: Define the function declaration in your application code. Function Declarations describe the function's name, parameters, and purpose to the model. Call LLM with function declarations: Send user prompt along with the function declaration(s) to the model. It analyzes the request and determines if a function call would be helpful. If so, it responds with a structured JSON object. Execute Function Code (Your Responsibility): The Model does not execute the function itself. It's your application's responsibility to process the response and check for Function Call, if Yes : Extract the name and args of the function and execute the corresponding function in your application. No: The model has provided a direct text response to the prompt (this flow is less emphasized in the example but is a possible outcome). Create User friendly response: If a function was executed, capture the result and send it back to the model in a subsequent turn of the conversation. It will use the result to generate a final, user-friendly response that incorporates the information from the function call. This process can be repeated over multiple turns, allowing for complex interactions and workflows. The model also supports calling multiple functions in a single turn ( parallel function calling ) and in sequence ( compositional function calling ). Step 1: Define a function declaration Define a function and its declaration within your application code that allows users to set light values and make an API request. This function could call external services or APIs. Python # Define a function that the model can call to control smart lights set_light_values_declaration = { "name" : "set_light_values" , "description" : "Sets the brightness and color temperature of a light." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "integer" , "description" : "Light level from 0 to 100. Zero is off and 100 is full brightness" , }, "color_temp" : { "type" : "string" , "enum" : [ "daylight" , "cool" , "warm" ], "description" : "Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`." , }, }, "required" : [ "brightness" , "color_temp" ], }, } # This is the actual function that would be called based on the model's suggestion def set_light_values ( brightness : int , color_temp : str ) - > dict [ str , int | str ]: """Set the brightness and color temperature of a room light. (mock API). Args: brightness: Light level from 0 to 100. Zero is off and 100 is full brightness color_temp: Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. Returns: A dictionary containing the set brightness and color temperature. """ return { "brightness" : brightness , "colorTemperature" : color_temp } JavaScript import { Type } from '@google/genai' ; // Define a function that the model can call to control smart lights const setLightValuesFunctionDeclaration = { name : 'set_light_values' , description : 'Sets the brightness and color temperature of a light.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'Light level from 0 to 100. Zero is off and 100 is full brightness' , }, color_temp : { type : Type . STRING , enum : [ 'daylight' , 'cool' , 'warm' ], description : 'Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`.' , }, }, required : [ 'brightness' , 'color_temp' ], }, }; /** * Set the brightness and color temperature of a room light. (mock API) * @param {number} brightness - Light level from 0 to 100. Zero is off and 100 is full brightness * @param {string} color_temp - Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. * @return {Object} A dictionary containing the set brightness and color temperature. */ function setLightValues ( brightness , color_temp ) { return { brightness : brightness , colorTemperature : color_temp }; } Step 2: Call the model with function declarations Once you have defined your function declarations, you can prompt the model to use them. It analyzes the prompt and function declarations and decides whether to respond directly or to call a function. If a function is called, the response object will contain a function call suggestion. Python from google.genai import types # Configure the client and tools client = genai . Client () tools = types . Tool ( function_declarations = [ set_light_values_declaration ]) config = types . GenerateContentConfig ( tools = [ tools ]) # Define user prompt contents = [ types . Content ( role = "user" , parts = [ types . Part ( text = "Turn the lights down to a romantic level" )] ) ] # Send request with function declarations response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = contents config = config , ) print ( response . candidates [ 0 ] . content . parts [ 0 ] . function_call ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Generation config with function declaration const config = { tools : [{ functionDeclarations : [ setLightValuesFunctionDeclaration ] }] }; // Configure the client const ai = new GoogleGenAI ({}); // Define user prompt const contents = [ { role : 'user' , parts : [{ text : 'Turn the lights down to a romantic level' }] } ]; // Send request with function declarations const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( response . functionCalls [ 0 ]); The model then returns a functionCall object in an OpenAPI compatible schema specifying how to call one or more of the declared functions in order to respond to the user's question. Python id = None args = { 'color_temp' : 'warm' , 'brightness' : 25 } name = 'set_light_values' JavaScript { name : 'set_light_values' , args : { brightness : 25 , color_temp : 'warm' } } Step 3: Execute set_light_values function code Extract the function call details from the model's response, parse the arguments , and execute the set_light_values function. Python # Extract tool call details, it may not be in the first part. tool_call = response . candidates [ 0 ] . content . parts [ 0 ] . function_call if tool_call . name == "set_light_values" : result = set_light_values ( ** tool_call . args ) print ( f "Function execution result: { result } " ) JavaScript // Extract tool call details const tool_call = response . functionCalls [ 0 ] let result ; if ( tool_call . name === 'set_light_values' ) { result = setLightValues ( tool_call . args . brightness , tool_call . args . color_temp ); console . log ( `Function execution result: ${ JSON . stringify ( result ) } ` ); } Step 4: Create user friendly response with function result and call the model again Finally, send the result of the function execution back to the model so it can incorporate this information into its final response to the user. Python # Create a function response part function_response_part = types . Part . from_function_response ( name = tool_call . name , response = { "result" : result }, ) # Append function call and result of the function execution to contents contents . append ( response . candidates [ 0 ] . content ) # Append the content from the model's response. contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Create a function response part const function_response_part = { name : tool_call . name , response : { result } } // Append function call and result of the function execution to contents contents . push ( response . candidates [ 0 ]. content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); // Get the final response from the model const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); This completes the function calling flow. The model successfully used the set_light_values function to perform the request action of the user. Function declarations When you implement function calling in a prompt, you create a tools object, which contains one or more function declarations . You define functions using JSON, specifically with a select subset of the OpenAPI schema format. A single function declaration can include the following parameters: name (string): A unique name for the function ( get_weather_forecast , send_email ). Use descriptive names without spaces or special characters (use underscores or camelCase). description (string): A clear and detailed explanation of the function's purpose and capabilities. This is crucial for the model to understand when to use the function. Be specific and provide examples if helpful ("Finds theaters based on location and optionally movie title which is currently playing in theaters."). parameters (object): Defines the input parameters the function expects. type (string): Specifies the overall data type, such as object . properties (object): Lists individual parameters, each with: type (string): The data type of the parameter, such as string , integer , boolean, array . description (string): A description of the parameter's purpose and format. Provide examples and constraints ("The city and state, e.g., 'San Francisco, CA' or a zip code e.g., '95616'."). enum (array, optional): If the parameter values are from a fixed set, use "enum" to list the allowed values instead of just describing them in the description. This improves accuracy ("enum": ["daylight", "cool", "warm"]). required (array): An array of strings listing the parameter names that are mandatory for the function to operate. Function calling with thinking Enabling "thinking" can improve function call performance by allowing the model to reason through a request before suggesting function calls. However, because the Gemini API is stateless, this reasoning context is lost between turns, which can reduce the quality of function calls as they require multiple turn requests. To preserve this context you can use thought signatures. A thought signature is an encrypted representation of the model's internal thought process that you pass back to the model on subsequent turns. To use thought signatures: Receive the signature: When thinking is enabled, the API response will include a thought_signature field containing an encrypted representation of the model's reasoning. Return the signature: When you send the function's execution result back to the server, include the thought_signature you received. This allows the model to restore its previous thinking context and will likely result in better function calling performance. Receiving signatures from the server Signatures are returned in the part after the model's thinking phase, which typically is a text or function call. Here are some examples of what thought signatures look like returned in each type of part, in response to the request "What's the weather in Lake Tahoe?" using the Get Weather example: Text part [{ "candidates" : [ { "content" : { "parts" : [ { "text" : "Here's what the weather in Lake Tahoe is today" , "thoughtSignature" : "ClcBVKhc7ru7KzUI7SrdUoIdAYLm/+i93aHjfIt4xHyAoO/G70tApxnK2ujBhOhC1PrRy1pkQa88fqFvpHNVd1HDjNLO7mkp6/hFwE+SPPEB3fh0hs4oM8MKhgIBVKhc7uIGvrS7i/T4HpfbnYrluFfWNjZ62gewqe4cVdR/Dlh+zbjtYmDD0gPZ+SuBO7vvHQdzsjePRP+2Y5XddX6LEf/cGGgakq8EhVvw/a6IVzUO6XmpHg2Ag1sl8E9+VFH/lC0R0ZuYdFWligtDuYwp5p5q3o59G0TtWeU2MC1y2MJfE9u/KWd313ldka80/X2W/xF2O/4djMp5G2WKcULfve75zeRCy0mc5iS3SB9mTH0cT6x0vtKjeBx50gcg+CQWtJcRuwTVzz54dmvmK9xvnqA8gKGw3DuaM9wfy5hyY7Qg0z3iyyWdP8T/lbjKim8IEQOk7O1vVwP1Ko7oMYH8JgA1CsoBAVSoXO6v4c5RSyd1cn6EIU0pEFQsjW7rYWPuZdOFq/tsGJT9BCfW7KGkPGwlNSq8jTJFvbcJ/DjtndISQYXwiXd2kGa5JfdS2Kh4zOxCxiWtOk+2nCc3+XQk2nonhO+esGJpkDdbbHZSqRgcUtYKq7q28iPFOQvOFyCiZNB7K86Z/6Hnagu2snSlN/BcTMaFGaWpcCClSUo4foRZn3WbNCoM8rcpD7qEJMp4a5baaSxyyeL1ZTGd2HLpFys/oiW6e3oAnhxuIysCwg==" } ] , "role" : "model" } , "index" : 0 } ] , # Remainder of response... Function call part [{ "candidates" : [ { "content" : { "parts" : [ { "functionCall" : { "name" : "getWeather" , "args" : { "city" : "Lake Tahoe" } } , "thoughtSignature" : "CiwBVKhc7nRyTi3HmggPD9iQiRc261f5jwuMdw3H/itDH0emsb9ZVo3Nwx9p6wpsAVSoXO5i8fDV4jBSBLoaWxB5zUdlGY6aIGp+I0oEnwRRSRQ1LOvrDlojEH8JE8HjiKXALdJrvNPiG+HY3GZEO8pZjEZtc3UoBUh7+SVyjK7Xolu7aRYYeUyzrCapoETWypER1jbrJXnFV23hCosBAVSoXO6oIPNJSmbuEDfGafOhuCSHkpr1yjTp35RXYqmCESzRzWf5+nFXLqncqeFo4ohoxbiYQVpVQbOZF81p8o9zg6xeRE7qMeOv+XN7enXGJ4/s3qNFQpfkSMqRdBITN1VpX7jyfEAjvxBNc7PDfDJZmEPY338ZIY5nFFcmzJSWjVrboFt2sMFv+A==" } ] , "role" : "model" } , "finishReason" : "STOP" , "index" : 0 } ] , # Remainder of response... You can confirm that you received a signature and see what a signature looks like using the following code: # Step 2: Call the model with function declarations # ...Generation config, Configure the client, and Define user prompt (No changes) # Send request with declarations (using a thinking model) response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents ) # See thought signatures for part in response . candidates [ 0 ] . content . parts : if part . thought_signature : print ( "Thought signature:" ) print ( part . thought_signature ) Returning signatures back to the server In order to return signatures back: You should return signatures along with their containing parts back to the server You shouldn't merge a part with a signature with another part which also contains a signature. The signature string is not concatenable You shouldn't merge one part with a signature with another part without a signature. This breaks the correct positioning of the thought represented by the signature. The code will remain the same as in Step 4 of the previous section. But in this case (as indicated in the comment below) you will return signatures to the model along with the result of the function execution so the model can incorporate the thoughts into its final response: Python # Step 4: Create user friendly response with function result and call the model again # ...Create a function response part (No change) # Append thought signatures, function call and result of the function execution to contents function_call_content = response . candidates [ 0 ] . content # Append the model's function call message, which includes thought signatures contents . append ( function_call_content ) contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Step 4: Create user friendly response with function result and call the model again // ...Create a function response part (No change) // Append thought signatures, function call and result of the function execution to contents const function_response_content = response . candidates [ 0 ]. content ; contents . push ( function_response_content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); The following shows what a request returning a thought signature may look like: [{ "contents" : [ { "role" : "user" , "parts" : [ { "text" : "what is the weather in Lake Tahoe?" } ] } , { "parts" : [ { "functionCall" : { "name" : "getWeather" , "args" : { "city" : "Lake Tahoe" } } , "thoughtSignature" : "CiIBVKhc7oDPpCaXyJKKssjqr4g3JNOSgJ/M2V+1THC1icsWCmwBVKhc7pBABbZ+zR3e9234WnWWS6GFXmf8IVwpnzjd5KYd7vyJbn/4vTorWBGayj/vbd9JPaZQjxdAIXhoE5mX/MDsQ7M9N/b0qJjHm39tYIBvS4sIWkMDHqTJqXGLzhhKtrTkfbV3RbaJEkQKmwEBVKhc7qVUgC3hfTXZLo9R3AJzUUIx50NKvJTb9B+UU+LBqgg7Nck1x5OpjWVS2R+SsveprIuYOruk2Y0H53J2OJF8qsxTdIq2si8DGW2V7WK8xyoJH5kbqd7drIw1jLb44b6lx4SMyB0VaULuTBki4d+Ljjg1tJTwR0IYMKqDLDZt9mheINsi0ZxcNjfpnDydRXdWbcSwzmK/wgqJAQFUqFzuKgNVElxs3cbO+xebr2IwcOro84nKTisi0tTp9bICPC9fTUhn3L+rvQWA+d3J1Za8at2bakrqiRj7BTh+CVO9fWQMAEQAs3ni0Z2hfaYG92tOD26E4IoZwyYEoWbfNudpH1fr5tEkyqnEGtWIh7H+XoZQ2DXeiOa+br7Zk88SrNE+trJMCogBAVSoXO5e9fBLg7hnbkmKsrzNLnQtLsQm1gNzjcjEC7nJYklYPp0KI2uGBE1PkM8XNsfllAfHVn7LzHcHNlbQ9pJ7QZTSIeG42goS971r5wNZwxaXwCTphClQh826eqJWo6A/28TtAVQWLhTx5ekbP7qb4nh1UblESZ1saxDQAEo4OKPbDzx5BgqKAQFUqFzuVyjNm5i0wN8hTDnKjfpDroEpPPTs531iFy9BOX+xDCdGHy8D+osFpaoBq6TFekQQbz4hIoUR1YEcP4zI80/cNimEeb9IcFxZTTxiNrbhbbcv0969DSMWhB+ZEqIz4vuw4GLe/xcUvqhlChQwFdgIbdOQHSHpatn5uDlktnP/bi26nKuXIwo0AVSoXO7US22OUH7d1f4abNPI0IyAvhqkPp12rbtWLx9vkOtojE8IP+xCfYtIFuZIzRNZqA==" } ] , "role" : "model" } , { "role" : "user" , "parts" : [ { "functionResponse" : { "name" : "getWeather" , "response" : { "response" : { "stringValue" : "Sunny and hot. 90 degrees Fahrenheit" } } } } ] } ] , # Remainder of request... Learn more about limitations and usage of thought signatures, and about thinking models in general, on the Thinking page. Parallel function calling In addition to single turn function calling, you can also call multiple functions at once. Parallel function calling lets you execute multiple functions at once and is used when the functions are not dependent on each other. This is useful in scenarios like gathering data from multiple independent sources, such as retrieving customer details from different databases or checking inventory levels across various warehouses or performing multiple actions such as converting your apartment into a disco. Python power_disco_ball = { "name" : "power_disco_ball" , "description" : "Powers the spinning disco ball." , "parameters" : { "type" : "object" , "properties" : { "power" : { "type" : "boolean" , "description" : "Whether to turn the disco ball on or off." , } }, "required" : [ "power" ], }, } start_music = { "name" : "start_music" , "description" : "Play some music matching the specified parameters." , "parameters" : { "type" : "object" , "properties" : { "energetic" : { "type" : "boolean" , "description" : "Whether the music is energetic or not." , }, "loud" : { "type" : "boolean" , "description" : "Whether the music is loud or not." , }, }, "required" : [ "energetic" , "loud" ], }, } dim_lights = { "name" : "dim_lights" , "description" : "Dim the lights." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "number" , "description" : "The brightness of the lights, 0.0 is off, 1.0 is full." , } }, "required" : [ "brightness" ], }, } JavaScript import { Type } from '@google/genai' ; const powerDiscoBall = { name : 'power_disco_ball' , description : 'Powers the spinning disco ball.' , parameters : { type : Type . OBJECT , properties : { power : { type : Type . BOOLEAN , description : 'Whether to turn the disco ball on or off.' } }, required : [ 'power' ] } }; const startMusic = { name : 'start_music' , description : 'Play some music matching the specified parameters.' , parameters : { type : Type . OBJECT , properties : { energetic : { type : Type . BOOLEAN , description : 'Whether the music is energetic or not.' }, loud : { type : Type . BOOLEAN , description : 'Whether the music is loud or not.' } }, required : [ 'energetic' , 'loud' ] } }; const dimLights = { name : 'dim_lights' , description : 'Dim the lights.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'The brightness of the lights, 0.0 is off, 1.0 is full.' } }, required : [ 'brightness' ] } }; Configure the function calling mode to allow using all of the specified tools. To learn more, you can read about configuring function calling . Python from google import genai from google.genai import types # Configure the client and tools client = genai . Client () house_tools = [ types . Tool ( function_declarations = [ power_disco_ball , start_music , dim_lights ]) ] config = types . GenerateContentConfig ( tools = house_tools , automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ), # Force the model to call 'any' function, instead of chatting. tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = 'ANY' ) ), ) chat = client . chats . create ( model = "gemini-2.5-flash" , config = config ) response = chat . send_message ( "Turn this place into a party!" ) # Print out each of the function calls requested from this single call print ( "Example 1: Forced function calling" ) for fn in response . function_calls : args = ", " . join ( f " { key } = { val } " for key , val in fn . args . items ()) print ( f " { fn . name } ( { args } )" ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Set up function declarations const houseFns = [ powerDiscoBall , startMusic , dimLights ]; const config = { tools : [{ functionDeclarations : houseFns }], // Force the model to call 'any' function, instead of chatting. toolConfig : { functionCallingConfig : { mode : 'any' } } }; // Configure the client const ai = new GoogleGenAI ({}); // Create a chat session const chat = ai . chats . create ({ model : 'gemini-2.5-flash' , config : config }); const response = await chat . sendMessage ({ message : 'Turn this place into a party!' }); // Print out each of the function calls requested from this single call console . log ( "Example 1: Forced function calling" ); for ( const fn of response . functionCalls ) { const args = Object . entries ( fn . args ) . map (([ key , val ]) = > ` ${ key } = ${ val } ` ) . join ( ', ' ); console . log ( ` ${ fn . name } ( ${ args } )` ); } Each of the printed results reflects a single function call that the model has requested. To send the results back, include the responses in the same order as they were requested. The Python SDK supports automatic function calling , which automatically converts Python functions to declarations, handles the function call execution and response cycle for you. Following is an example for the disco use case. Note: Automatic Function Calling is a Python SDK only feature at the moment. Python from google import genai from google.genai import types # Actual function implementations def power_disco_ball_impl ( power : bool ) - > dict : """Powers the spinning disco ball. Args: power: Whether to turn the disco ball on or off. Returns: A status dictionary indicating the current state. """ return { "status" : f "Disco ball powered { 'on' if power else 'off' } " } def start_music_impl ( energetic : bool , loud : bool ) - > dict : """Play some music matching the specified parameters. Args: energetic: Whether the music is energetic or not. loud: Whether the music is loud or not. Returns: A dictionary containing the music settings. """ music_type = "energetic" if energetic else "chill" volume = "loud" if loud else "quiet" return { "music_type" : music_type , "volume" : volume } def dim_lights_impl ( brightness : float ) - > dict : """Dim the lights. Args: brightness: The brightness of the lights, 0.0 is off, 1.0 is full. Returns: A dictionary containing the new brightness setting. """ return { "brightness" : brightness } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ power_disco_ball_impl , start_music_impl , dim_lights_impl ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Do everything you need to this place into party!" , config = config , ) print ( " \n Example 2: Automatic function calling" ) print ( response . text ) # I've turned on the disco ball, started playing loud and energetic music, and dimmed the lights to 50% brightness. Let's get this party started! Compositional function calling Compositional or sequential function calling allows Gemini to chain multiple function calls together to fulfill a complex request. For example, to answer "Get the temperature in my current location", the Gemini API might first invoke a get_current_location() function followed by a get_weather() function that takes the location as a parameter. The following example demonstrates how to implement compositional function calling using the Python SDK and automatic function calling. Python This example uses the automatic function calling feature of the google-genai Python SDK. The SDK automatically converts the Python functions to the required schema, executes the function calls when requested by the model, and sends the results back to the model to complete the task. import os from google import genai from google.genai import types # Example Functions def get_weather_forecast ( location : str ) - > dict : """Gets the current weather temperature for a given location.""" print ( f "Tool Call: get_weather_forecast(location= { location } )" ) # TODO: Make API call print ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ) return { "temperature" : 25 , "unit" : "celsius" } # Dummy response def set_thermostat_temperature ( temperature : int ) - > dict : """Sets the thermostat to a desired temperature.""" print ( f "Tool Call: set_thermostat_temperature(temperature= { temperature } )" ) # TODO: Interact with a thermostat API print ( "Tool Response: {'status': 'success'}" ) return { "status" : "success" } # Configure the client and model client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_weather_forecast , set_thermostat_temperature ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , config = config , ) # Print the final, user-facing response print ( response . text ) Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . I 've set the thermostat to 20°C. JavaScript This example shows how to use JavaScript/TypeScript SDK to do comopositional function calling using a manual execution loop. import { GoogleGenAI , Type } from "@google/genai" ; // Configure the client const ai = new GoogleGenAI ({}); // Example Functions function get_weather_forecast ({ location }) { console . log ( `Tool Call: get_weather_forecast(location= ${ location } )` ); // TODO: Make API call console . log ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ); return { temperature : 25 , unit : "celsius" }; } function set_thermostat_temperature ({ temperature }) { console . log ( `Tool Call: set_thermostat_temperature(temperature= ${ temperature } )` , ); // TODO: Make API call console . log ( "Tool Response: {'status': 'success'}" ); return { status : "success" }; } const toolFunctions = { get_weather_forecast , set_thermostat_temperature , }; const tools = [ { functionDeclarations : [ { name : "get_weather_forecast" , description : "Gets the current weather temperature for a given location." , parameters : { type : Type . OBJECT , properties : { location : { type : Type . STRING , }, }, required : [ "location" ], }, }, { name : "set_thermostat_temperature" , description : "Sets the thermostat to a desired temperature." , parameters : { type : Type . OBJECT , properties : { temperature : { type : Type . NUMBER , }, }, required : [ "temperature" ], }, }, ], }, ]; // Prompt for the model let contents = [ { role : "user" , parts : [ { text : "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , }, ], }, ]; // Loop until the model has no more function calls to make while ( true ) { const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents , config : { tools }, }); if ( result . functionCalls && result . functionCalls . length > 0 ) { const functionCall = result . functionCalls [ 0 ]; const { name , args } = functionCall ; if ( ! toolFunctions [ name ]) { throw new Error ( `Unknown function call: ${ name } ` ); } // Call the function and get the response. const toolResponse = toolFunctions [ name ]( args ); const functionResponsePart = { name : functionCall . name , response : { result : toolResponse , }, }; // Send the function response back to the model. contents . push ({ role : "model" , parts : [ { functionCall : functionCall , }, ], }); contents . push ({ role : "user" , parts : [ { functionResponse : functionResponsePart , }, ], }); } else { // No more function calls, break the loop. console . log ( result . text ); break ; } } Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . It 's 25°C in London, so I' ve set the thermostat to 20 ° C . Compositional function calling is a native Live API feature. This means Live API can handle the function calling similar to the Python SDK. Python # Light control schemas turn_on_the_lights_schema = { 'name' : 'turn_on_the_lights' } turn_off_the_lights_schema = { 'name' : 'turn_off_the_lights' } prompt = """ Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? """ tools = [ { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} ] await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Light control schemas const turnOnTheLightsSchema = { name : 'turn_on_the_lights' }; const turnOffTheLightsSchema = { name : 'turn_off_the_lights' }; const prompt = ` Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? ` ; const tools = [ { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } ]; await run ( prompt , tools = tools , modality = "AUDIO" ) Function calling modes The Gemini API lets you control how the model uses the provided tools (function declarations). Specifically, you can set the mode within the. function_calling_config . AUTO (Default) : The model decides whether to generate a natural language response or suggest a function call based on the prompt and context. This is the most flexible mode and recommended for most scenarios. ANY : The model is constrained to always predict a function call and guarantees function schema adherence. If allowed_function_names is not specified, the model can choose from any of the provided function declarations. If allowed_function_names is provided as a list, the model can only choose from the functions in that list. Use this mode when you require a function call response to every prompt (if applicable). NONE : The model is prohibited from making function calls. This is equivalent to sending a request without any function declarations. Use this to temporarily disable function calling without removing your tool definitions. Python from google.genai import types # Configure function calling mode tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = "ANY" , allowed_function_names = [ "get_current_temperature" ] ) ) # Create the generation config config = types . GenerateContentConfig ( tools = [ tools ], # not defined here. tool_config = tool_config , ) JavaScript import { FunctionCallingConfigMode } from '@google/genai' ; // Configure function calling mode const toolConfig = { functionCallingConfig : { mode : FunctionCallingConfigMode . ANY , allowedFunctionNames : [ 'get_current_temperature' ] } }; // Create the generation config const config = { tools : tools , // not defined here. toolConfig : toolConfig , }; Automatic function calling (Python only) When using the Python SDK, you can provide Python functions directly as tools. The SDK automatically converts the Python function to declarations, handles the function call execution and the response cycle for you. The Python SDK then automatically: Detects function call responses from the model. Call the corresponding Python function in your code. Sends the function response back to the model. Returns the model's final text response. To use this, define your function with type hints and a docstring, and then pass the function itself (not a JSON declaration) as a tool: Python from google import genai from google.genai import types # Define the function with type hints and docstring def get_current_temperature ( location : str ) - > dict : """Gets the current temperature for a given location. Args: location: The city and state, e.g. San Francisco, CA Returns: A dictionary containing the temperature and unit. """ # ... (implementation) ... return { "temperature" : 25 , "unit" : "Celsius" } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_current_temperature ] ) # Pass the function itself # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "What's the temperature in Boston?" , config = config , ) print ( response . text ) # The SDK handles the function call and returns the final text You can disable automatic function calling with: Python config = types . GenerateContentConfig ( tools = [ get_current_temperature ], automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ) ) Automatic function schema declaration Automatic schema extraction from Python functions doesn't work in all cases. For example, it doesn't handle cases where you describe the fields of a nested dictionary-object. The API is able to describe any of the following types: Python AllowedType = ( int | float | bool | str | list [ 'AllowedType' ] | dict [ str , AllowedType ]) To see what the inferred schema looks like, you can convert it using from_callable : Python def multiply ( a : float , b : float ): """Returns a * b.""" return a * b fn_decl = types . FunctionDeclaration . from_callable ( callable = multiply , client = client ) # to_json_dict() provides a clean JSON representation. print ( fn_decl . to_json_dict ()) Multi-tool use: Combine native tools with function calling You can enable multiple tools combining native tools with function calling at the same time. Here's an example that enables two tools, Grounding with Google Search and code execution , in a request using the Live API . Note: Multi-tool use is a- Live API only feature at the moment. The run() function declaration, which handles the asynchronous websocket setup, is omitted for brevity. Python # Multiple tasks example - combining lights, code execution, and search prompt = """ Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! """ tools = [ { 'google_search' : {}}, { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} # not defined here. ] # Execute the prompt with specified tools in audio modality await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Multiple tasks example - combining lights, code execution, and search const prompt = ` Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! ` ; const tools = [ { googleSearch : {} }, { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } // not defined here. ]; // Execute the prompt with specified tools in audio modality await run ( prompt , { tools : tools , modality : "AUDIO" }); Python developers can try this out in the Live API Tool Use notebook . Model context protocol (MCP) Model Context Protocol (MCP) is an open standard for connecting AI applications with external tools and data. MCP provides a common protocol for models to access context, such as functions (tools), data sources (resources), or predefined prompts. The Gemini SDKs have built-in support for the MCP, reducing boilerplate code and offering automatic tool calling for MCP tools. When the model generates an MCP tool call, the Python and JavaScript client SDK can automatically execute the MCP tool and send the response back to the model in a subsequent request, continuing this loop until no more tool calls are made by the model. Here, you can find an example of how to use a local MCP server with Gemini and mcp SDK. Python Make sure the latest version of the mcp SDK is installed on your platform of choice. pip install mcp Note: Python supports automatic tool calling by passing in the ClientSession into the tools parameters. If you want to disable it, you can provide automatic_function_calling with disabled True . import os import asyncio from datetime import datetime from mcp import ClientSession , StdioServerParameters from mcp.client.stdio import stdio_client from google import genai client = genai . Client () # Create server parameters for stdio connection server_params = StdioServerParameters ( command = "npx" , # Executable args = [ "-y" , "@philschmid/weather-mcp" ], # MCP Server env = None , # Optional environment variables ) async def run (): async with stdio_client ( server_params ) as ( read , write ): async with ClientSession ( read , write ) as session : # Prompt to get the weather for the current day in London. prompt = f "What is the weather in London in { datetime . now () . strftime ( '%Y-%m- %d ' ) } ?" # Initialize the connection between client and server await session . initialize () # Send request to the model with MCP function declarations response = await client . aio . models . generate_content ( model = "gemini-2.5-flash" , contents = prompt , config = genai . types . GenerateContentConfig ( temperature = 0 , tools = [ session ], # uses the session, will automatically call the tool # Uncomment if you **don't** want the SDK to automatically call the tool # automatic_function_calling=genai.types.AutomaticFunctionCallingConfig( # disable=True # ), ), ) print ( response . text ) # Start the asyncio event loop and run the main function asyncio . run ( run ()) JavaScript Make sure the latest version of the mcp SDK is installed on your platform of choice. npm install @modelcontextprotocol/sdk Note: JavaScript supports automatic tool calling by wrapping the client with mcpToTool . If you want to disable it, you can provide automaticFunctionCalling with disabled true . import { GoogleGenAI , FunctionCallingConfigMode , mcpToTool } from '@google/genai' ; import { Client } from "@modelcontextprotocol/sdk/client/index.js" ; import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js" ; // Create server parameters for stdio connection const serverParams = new StdioClientTransport ({ command : "npx" , // Executable args : [ "-y" , "@philschmid/weather-mcp" ] // MCP Server }); const client = new Client ( { name : "example-client" , version : "1.0.0" } ); // Configure the client const ai = new GoogleGenAI ({}); // Initialize the connection between client and server await client . connect ( serverParams ); // Send request to the model with MCP tools const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : `What is the weather in London in ${ new Date (). toLocaleDateString () } ?` , config : { tools : [ mcpToTool ( client )], // uses the session, will automatically call the tool // Uncomment if you **don't** want the sdk to automatically call the tool // automaticFunctionCalling: { // disable: true, // }, }, }); console . log ( response . text ) // Close the connection await client . close (); Limitations with built-in MCP support Built-in MCP support is a experimental feature in our SDKs and has the following limitations: Only tools are supported, not resources nor prompts It is available for the Python and JavaScript/TypeScript SDK. Breaking changes might occur in future releases. Manual integration of MCP servers is always an option if these limit what you're building. Supported models This section lists models and their function calling capabilities. Experimental models are not included. You can find a comprehensive capabilities overview on the model overview page. Model Function Calling Parallel Function Calling Compositional Function Calling Gemini 2.5 Pro ✔️ ✔️ ✔️ Gemini 2.5 Flash ✔️ ✔️ ✔️ Gemini 2.5 Flash-Lite ✔️ ✔️ ✔️ Gemini 2.0 Flash ✔️ ✔️ ✔️ Gemini 2.0 Flash-Lite X X X Best practices Function and Parameter Descriptions: Be extremely clear and specific in your descriptions. The model relies on these to choose the correct function and provide appropriate arguments. Naming: Use descriptive function names (without spaces, periods, or dashes). Strong Typing: Use specific types (integer, string, enum) for parameters to reduce errors. If a parameter has a limited set of valid values, use an enum. Tool Selection: While the model can use an arbitrary number of tools, providing too many can increase the risk of selecting an incorrect or suboptimal tool. For best results, aim to provide only the relevant tools for the context or task, ideally keeping the active set to a maximum of 10-20. Consider dynamic tool selection based on conversation context if you have a large total number of tools. Prompt Engineering: Provide context: Tell the model its role (e.g., "You are a helpful weather assistant."). Give instructions: Specify how and when to use functions (e.g., "Don't guess dates; always use a future date for forecasts."). Encourage clarification: Instruct the model to ask clarifying questions if needed. Temperature: Use a low temperature (e.g., 0) for more deterministic and reliable function calls. Validation: If a function call has significant consequences (e.g., placing an order), validate the call with the user before executing it. Error Handling : Implement robust error handling in your functions to gracefully handle unexpected inputs or API failures. Return informative error messages that the model can use to generate helpful responses to the user. Security: Be mindful of security when calling external APIs. Use appropriate authentication and authorization mechanisms. Avoid exposing sensitive data in function calls. Token Limits: Function descriptions and parameters count towards your input token limit. If you're hitting token limits, consider limiting the number of functions or the length of the descriptions, break down complex tasks into smaller, more focused function sets. Notes and limitations Only a subset of the OpenAPI schema is supported. Supported parameter types in Python are limited. Automatic function calling is a Python SDK feature only. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-10 UTC.
|
text_content/docs_google-search_5a931e9e.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/google-search
|
| 2 |
+
Title: Grounding with Google Search | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Grounding with Google Search | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Grounding with Google Search Grounding with Google Search connects the Gemini model to real-time web content and works with all available languages . This allows Gemini to provide more accurate answers and cite verifiable sources beyond its knowledge cutoff. Grounding helps you build applications that can: Increase factual accuracy: Reduce model hallucinations by basing responses on real-world information. Access real-time information: Answer questions about recent events and topics. Provide citations: Build user trust by showing the sources for the model's claims. Python from google import genai from google.genai import types # Configure the client client = genai . Client () # Define the grounding tool grounding_tool = types . Tool ( google_search = types . GoogleSearch () ) # Configure generation settings config = types . GenerateContentConfig ( tools = [ grounding_tool ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Who won the euro 2024?" , config = config , ) # Print the grounded response print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; // Configure the client const ai = new GoogleGenAI ({}); // Define the grounding tool const groundingTool = { googleSearch : {}, }; // Configure generation settings const config = { tools : [ groundingTool ], }; // Make the request const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : "Who won the euro 2024?" , config , }); // Print the grounded response console . log ( response . text ); REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -X POST \ -d '{ "contents": [ { "parts": [ {"text": "Who won the euro 2024?"} ] } ], "tools": [ { "google_search": {} } ] }' You can learn more by trying the Search tool notebook . How grounding with Google Search works When you enable the google_search tool, the model handles the entire workflow of searching, processing, and citing information automatically. User Prompt: Your application sends a user's prompt to the Gemini API with the google_search tool enabled. Prompt Analysis: The model analyzes the prompt and determines if a Google Search can improve the answer. Google Search: If needed, the model automatically generates one or multiple search queries and executes them. Search Results Processing: The model processes the search results, synthesizes the information, and formulates a response. Grounded Response: The API returns a final, user-friendly response that is grounded in the search results. This response includes the model's text answer and groundingMetadata with the search queries, web results, and citations. Understanding the Grounding Response When a response is successfully grounded, the response includes a groundingMetadata field. This structured data is essential for verifying claims and building a rich citation experience in your application. { "candidates" : [ { "content" : { "parts" : [ { "text" : "Spain won Euro 2024, defeating England 2-1 in the final. This victory marks Spain's record fourth European Championship title." } ], "role" : "model" }, "groundingMetadata" : { "webSearchQueries" : [ "UEFA Euro 2024 winner" , "who won euro 2024" ], "searchEntryPoint" : { "renderedContent" : "<!-- HTML and CSS for the search widget -->" }, "groundingChunks" : [ { "web" : { "uri" : "https://vertexaisearch.cloud.google.com....." , "title" : "aljazeera.com" }}, { "web" : { "uri" : "https://vertexaisearch.cloud.google.com....." , "title" : "uefa.com" }} ], "groundingSupports" : [ { "segment" : { "startIndex" : 0 , "endIndex" : 85 , "text" : "Spain won Euro 2024, defeatin..." }, "groundingChunkIndices" : [ 0 ] }, { "segment" : { "startIndex" : 86 , "endIndex" : 210 , "text" : "This victory marks Spain's..." }, "groundingChunkIndices" : [ 0 , 1 ] } ] } } ] } The Gemini API returns the following information with the groundingMetadata : webSearchQueries : Array of the search queries used. This is useful for debugging and understanding the model's reasoning process. searchEntryPoint : Contains the HTML and CSS to render the required Search Suggestions. Full usage requirements are detailed in the Terms of Service . groundingChunks : Array of objects containing the web sources ( uri and title ). groundingSupports : Array of chunks to connect model response text to the sources in groundingChunks . Each chunk links a text segment (defined by startIndex and endIndex ) to one or more groundingChunkIndices . This is the key to building inline citations. Grounding with Google Search can also be used in combination with the URL context tool to ground responses in both public web data and the specific URLs you provide. Attributing Sources with inline Citations The API returns structured citation data, giving you complete control over how you display sources in your user interface. You can use the groundingSupports and groundingChunks fields to link the model's statements directly to their sources. Here is a common pattern for processing the metadata to create a response with inline, clickable citations. Python def add_citations ( response ): text = response . text supports = response . candidates [ 0 ] . grounding_metadata . grounding_supports chunks = response . candidates [ 0 ] . grounding_metadata . grounding_chunks # Sort supports by end_index in descending order to avoid shifting issues when inserting. sorted_supports = sorted ( supports , key = lambda s : s . segment . end_index , reverse = True ) for support in sorted_supports : end_index = support . segment . end_index if support . grounding_chunk_indices : # Create citation string like [1](link1)[2](link2) citation_links = [] for i in support . grounding_chunk_indices : if i < len ( chunks ): uri = chunks [ i ] . web . uri citation_links . append ( f "[ { i + 1 } ]( { uri } )" ) citation_string = ", " . join ( citation_links ) text = text [: end_index ] + citation_string + text [ end_index :] return text # Assuming response with grounding metadata text_with_citations = add_citations ( response ) print ( text_with_citations ) JavaScript function addCitations ( response ) { let text = response . text ; const supports = response . candidates [ 0 ] ? . groundingMetadata ? . groundingSupports ; const chunks = response . candidates [ 0 ] ? . groundingMetadata ? . groundingChunks ; // Sort supports by end_index in descending order to avoid shifting issues when inserting. const sortedSupports = [... supports ]. sort ( ( a , b ) = > ( b . segment ? . endIndex ?? 0 ) - ( a . segment ? . endIndex ?? 0 ), ); for ( const support of sortedSupports ) { const endIndex = support . segment ? . endIndex ; if ( endIndex === undefined || ! support . groundingChunkIndices ? . length ) { continue ; } const citationLinks = support . groundingChunkIndices . map ( i = > { const uri = chunks [ i ] ? . web ? . uri ; if ( uri ) { return `[ ${ i + 1 } ]( ${ uri } )` ; } return null ; }) . filter ( Boolean ); if ( citationLinks . length > 0 ) { const citationString = citationLinks . join ( ", " ); text = text . slice ( 0 , endIndex ) + citationString + text . slice ( endIndex ); } } return text ; } const textWithCitations = addCitations ( response ); console . log ( textWithCitations ); The new response with inline citations will look like this: Spain won Euro 2024, defeating England 2-1 in the final.[1](https:/...), [2](https:/...), [4](https:/...), [5](https:/...) This victory marks Spain's record-breaking fourth European Championship title.[5]((https:/...), [2](https:/...), [3](https:/...), [4](https:/...) Pricing When you use Grounding with Google Search, your project is billed per API request that includes the google_search tool. If the model decides to execute multiple search queries to answer a single prompt (for example, searching for "UEFA Euro 2024 winner" and "Spain vs England Euro 2024 final score" within the same API call), this counts as a single billable use of the tool for that request. For detailed pricing information, see the Gemini API pricing page . Supported Models Experimental and Preview models are not included. You can find their capabilities on the model overview page. Model Grounding with Google Search Gemini 2.5 Pro ✔️ Gemini 2.5 Flash ✔️ Gemini 2.0 Flash ✔️ Gemini 1.5 Pro ✔️ Gemini 1.5 Flash ✔️ Note: Older models use a google_search_retrieval tool. For all current models, use the google_search tool as shown in the examples. Grounding with Gemini 1.5 Models (Legacy) While the google_search tool is recommended for Gemini 2.0 and later, Gemini 1.5 support a legacy tool named google_search_retrieval . This tool provides a dynamic mode that allows the model to decide whether to perform a search based on its confidence that the prompt requires fresh information. If the model's confidence is above a dynamic_threshold you set (a value between 0.0 and 1.0), it will perform a search. Python # Note: This is a legacy approach for Gemini 1.5 models. # The 'google_search' tool is recommended for all new development. import os from google import genai from google.genai import types client = genai . Client () retrieval_tool = types . Tool ( google_search_retrieval = types . GoogleSearchRetrieval ( dynamic_retrieval_config = types . DynamicRetrievalConfig ( mode = types . DynamicRetrievalConfigMode . MODE_DYNAMIC , dynamic_threshold = 0.7 # Only search if confidence > 70% ) ) ) config = types . GenerateContentConfig ( tools = [ retrieval_tool ] ) response = client . models . generate_content ( model = 'gemini-1.5-flash' , contents = "Who won the euro 2024?" , config = config , ) print ( response . text ) if not response . candidates [ 0 ] . grounding_metadata : print ( " \n Model answered from its own knowledge." ) JavaScript // Note: This is a legacy approach for Gemini 1.5 models. // The 'googleSearch' tool is recommended for all new development. import { GoogleGenAI , DynamicRetrievalConfigMode } from "@google/genai" ; const ai = new GoogleGenAI ({}); const retrievalTool = { googleSearchRetrieval : { dynamicRetrievalConfig : { mode : DynamicRetrievalConfigMode . MODE_DYNAMIC , dynamicThreshold : 0.7 , // Only search if confidence > 70% }, }, }; const config = { tools : [ retrievalTool ], }; const response = await ai . models . generateContent ({ model : "gemini-1.5-flash" , contents : "Who won the euro 2024?" , config , }); console . log ( response . text ); if ( ! response . candidates ? .[ 0 ] ? . groundingMetadata ) { console . log ( "\nModel answered from its own knowledge." ); } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -X POST \ -d '{ "contents": [ {"parts": [{"text": "Who won the euro 2024?"}]} ], "tools": [{ "google_search_retrieval": { "dynamic_retrieval_config": { "mode": "MODE_DYNAMIC", "dynamic_threshold": 0.7 } } }] }' What's next Try the Grounding with Google Search in the Gemini API Cookbook . Learn about other available tools, like Function Calling . Learn how to augment prompts with specific URLs using the URL context tool . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
|
text_content/docs_google-search_c7ceaf0e.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/google-search#main-content
|
| 2 |
+
Title: Grounding with Google Search | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Grounding with Google Search | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Grounding with Google Search Grounding with Google Search connects the Gemini model to real-time web content and works with all available languages . This allows Gemini to provide more accurate answers and cite verifiable sources beyond its knowledge cutoff. Grounding helps you build applications that can: Increase factual accuracy: Reduce model hallucinations by basing responses on real-world information. Access real-time information: Answer questions about recent events and topics. Provide citations: Build user trust by showing the sources for the model's claims. Python from google import genai from google.genai import types # Configure the client client = genai . Client () # Define the grounding tool grounding_tool = types . Tool ( google_search = types . GoogleSearch () ) # Configure generation settings config = types . GenerateContentConfig ( tools = [ grounding_tool ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Who won the euro 2024?" , config = config , ) # Print the grounded response print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; // Configure the client const ai = new GoogleGenAI ({}); // Define the grounding tool const groundingTool = { googleSearch : {}, }; // Configure generation settings const config = { tools : [ groundingTool ], }; // Make the request const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : "Who won the euro 2024?" , config , }); // Print the grounded response console . log ( response . text ); REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -X POST \ -d '{ "contents": [ { "parts": [ {"text": "Who won the euro 2024?"} ] } ], "tools": [ { "google_search": {} } ] }' You can learn more by trying the Search tool notebook . How grounding with Google Search works When you enable the google_search tool, the model handles the entire workflow of searching, processing, and citing information automatically. User Prompt: Your application sends a user's prompt to the Gemini API with the google_search tool enabled. Prompt Analysis: The model analyzes the prompt and determines if a Google Search can improve the answer. Google Search: If needed, the model automatically generates one or multiple search queries and executes them. Search Results Processing: The model processes the search results, synthesizes the information, and formulates a response. Grounded Response: The API returns a final, user-friendly response that is grounded in the search results. This response includes the model's text answer and groundingMetadata with the search queries, web results, and citations. Understanding the Grounding Response When a response is successfully grounded, the response includes a groundingMetadata field. This structured data is essential for verifying claims and building a rich citation experience in your application. { "candidates" : [ { "content" : { "parts" : [ { "text" : "Spain won Euro 2024, defeating England 2-1 in the final. This victory marks Spain's record fourth European Championship title." } ], "role" : "model" }, "groundingMetadata" : { "webSearchQueries" : [ "UEFA Euro 2024 winner" , "who won euro 2024" ], "searchEntryPoint" : { "renderedContent" : "<!-- HTML and CSS for the search widget -->" }, "groundingChunks" : [ { "web" : { "uri" : "https://vertexaisearch.cloud.google.com....." , "title" : "aljazeera.com" }}, { "web" : { "uri" : "https://vertexaisearch.cloud.google.com....." , "title" : "uefa.com" }} ], "groundingSupports" : [ { "segment" : { "startIndex" : 0 , "endIndex" : 85 , "text" : "Spain won Euro 2024, defeatin..." }, "groundingChunkIndices" : [ 0 ] }, { "segment" : { "startIndex" : 86 , "endIndex" : 210 , "text" : "This victory marks Spain's..." }, "groundingChunkIndices" : [ 0 , 1 ] } ] } } ] } The Gemini API returns the following information with the groundingMetadata : webSearchQueries : Array of the search queries used. This is useful for debugging and understanding the model's reasoning process. searchEntryPoint : Contains the HTML and CSS to render the required Search Suggestions. Full usage requirements are detailed in the Terms of Service . groundingChunks : Array of objects containing the web sources ( uri and title ). groundingSupports : Array of chunks to connect model response text to the sources in groundingChunks . Each chunk links a text segment (defined by startIndex and endIndex ) to one or more groundingChunkIndices . This is the key to building inline citations. Grounding with Google Search can also be used in combination with the URL context tool to ground responses in both public web data and the specific URLs you provide. Attributing Sources with inline Citations The API returns structured citation data, giving you complete control over how you display sources in your user interface. You can use the groundingSupports and groundingChunks fields to link the model's statements directly to their sources. Here is a common pattern for processing the metadata to create a response with inline, clickable citations. Python def add_citations ( response ): text = response . text supports = response . candidates [ 0 ] . grounding_metadata . grounding_supports chunks = response . candidates [ 0 ] . grounding_metadata . grounding_chunks # Sort supports by end_index in descending order to avoid shifting issues when inserting. sorted_supports = sorted ( supports , key = lambda s : s . segment . end_index , reverse = True ) for support in sorted_supports : end_index = support . segment . end_index if support . grounding_chunk_indices : # Create citation string like [1](link1)[2](link2) citation_links = [] for i in support . grounding_chunk_indices : if i < len ( chunks ): uri = chunks [ i ] . web . uri citation_links . append ( f "[ { i + 1 } ]( { uri } )" ) citation_string = ", " . join ( citation_links ) text = text [: end_index ] + citation_string + text [ end_index :] return text # Assuming response with grounding metadata text_with_citations = add_citations ( response ) print ( text_with_citations ) JavaScript function addCitations ( response ) { let text = response . text ; const supports = response . candidates [ 0 ] ? . groundingMetadata ? . groundingSupports ; const chunks = response . candidates [ 0 ] ? . groundingMetadata ? . groundingChunks ; // Sort supports by end_index in descending order to avoid shifting issues when inserting. const sortedSupports = [... supports ]. sort ( ( a , b ) = > ( b . segment ? . endIndex ?? 0 ) - ( a . segment ? . endIndex ?? 0 ), ); for ( const support of sortedSupports ) { const endIndex = support . segment ? . endIndex ; if ( endIndex === undefined || ! support . groundingChunkIndices ? . length ) { continue ; } const citationLinks = support . groundingChunkIndices . map ( i = > { const uri = chunks [ i ] ? . web ? . uri ; if ( uri ) { return `[ ${ i + 1 } ]( ${ uri } )` ; } return null ; }) . filter ( Boolean ); if ( citationLinks . length > 0 ) { const citationString = citationLinks . join ( ", " ); text = text . slice ( 0 , endIndex ) + citationString + text . slice ( endIndex ); } } return text ; } const textWithCitations = addCitations ( response ); console . log ( textWithCitations ); The new response with inline citations will look like this: Spain won Euro 2024, defeating England 2-1 in the final.[1](https:/...), [2](https:/...), [4](https:/...), [5](https:/...) This victory marks Spain's record-breaking fourth European Championship title.[5]((https:/...), [2](https:/...), [3](https:/...), [4](https:/...) Pricing When you use Grounding with Google Search, your project is billed per API request that includes the google_search tool. If the model decides to execute multiple search queries to answer a single prompt (for example, searching for "UEFA Euro 2024 winner" and "Spain vs England Euro 2024 final score" within the same API call), this counts as a single billable use of the tool for that request. For detailed pricing information, see the Gemini API pricing page . Supported Models Experimental and Preview models are not included. You can find their capabilities on the model overview page. Model Grounding with Google Search Gemini 2.5 Pro ✔️ Gemini 2.5 Flash ✔️ Gemini 2.0 Flash ✔️ Gemini 1.5 Pro ✔️ Gemini 1.5 Flash ✔️ Note: Older models use a google_search_retrieval tool. For all current models, use the google_search tool as shown in the examples. Grounding with Gemini 1.5 Models (Legacy) While the google_search tool is recommended for Gemini 2.0 and later, Gemini 1.5 support a legacy tool named google_search_retrieval . This tool provides a dynamic mode that allows the model to decide whether to perform a search based on its confidence that the prompt requires fresh information. If the model's confidence is above a dynamic_threshold you set (a value between 0.0 and 1.0), it will perform a search. Python # Note: This is a legacy approach for Gemini 1.5 models. # The 'google_search' tool is recommended for all new development. import os from google import genai from google.genai import types client = genai . Client () retrieval_tool = types . Tool ( google_search_retrieval = types . GoogleSearchRetrieval ( dynamic_retrieval_config = types . DynamicRetrievalConfig ( mode = types . DynamicRetrievalConfigMode . MODE_DYNAMIC , dynamic_threshold = 0.7 # Only search if confidence > 70% ) ) ) config = types . GenerateContentConfig ( tools = [ retrieval_tool ] ) response = client . models . generate_content ( model = 'gemini-1.5-flash' , contents = "Who won the euro 2024?" , config = config , ) print ( response . text ) if not response . candidates [ 0 ] . grounding_metadata : print ( " \n Model answered from its own knowledge." ) JavaScript // Note: This is a legacy approach for Gemini 1.5 models. // The 'googleSearch' tool is recommended for all new development. import { GoogleGenAI , DynamicRetrievalConfigMode } from "@google/genai" ; const ai = new GoogleGenAI ({}); const retrievalTool = { googleSearchRetrieval : { dynamicRetrievalConfig : { mode : DynamicRetrievalConfigMode . MODE_DYNAMIC , dynamicThreshold : 0.7 , // Only search if confidence > 70% }, }, }; const config = { tools : [ retrievalTool ], }; const response = await ai . models . generateContent ({ model : "gemini-1.5-flash" , contents : "Who won the euro 2024?" , config , }); console . log ( response . text ); if ( ! response . candidates ? .[ 0 ] ? . groundingMetadata ) { console . log ( "\nModel answered from its own knowledge." ); } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -X POST \ -d '{ "contents": [ {"parts": [{"text": "Who won the euro 2024?"}]} ], "tools": [{ "google_search_retrieval": { "dynamic_retrieval_config": { "mode": "MODE_DYNAMIC", "dynamic_threshold": 0.7 } } }] }' What's next Try the Grounding with Google Search in the Gemini API Cookbook . Learn about other available tools, like Function Calling . Learn how to augment prompts with specific URLs using the URL context tool . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
|
text_content/docs_grounding_06db198b.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/grounding#main-content
|
| 2 |
+
Title: Grounding with Google Search | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Grounding with Google Search | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Grounding with Google Search Grounding with Google Search connects the Gemini model to real-time web content and works with all available languages . This allows Gemini to provide more accurate answers and cite verifiable sources beyond its knowledge cutoff. Grounding helps you build applications that can: Increase factual accuracy: Reduce model hallucinations by basing responses on real-world information. Access real-time information: Answer questions about recent events and topics. Provide citations: Build user trust by showing the sources for the model's claims. Python from google import genai from google.genai import types # Configure the client client = genai . Client () # Define the grounding tool grounding_tool = types . Tool ( google_search = types . GoogleSearch () ) # Configure generation settings config = types . GenerateContentConfig ( tools = [ grounding_tool ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Who won the euro 2024?" , config = config , ) # Print the grounded response print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; // Configure the client const ai = new GoogleGenAI ({}); // Define the grounding tool const groundingTool = { googleSearch : {}, }; // Configure generation settings const config = { tools : [ groundingTool ], }; // Make the request const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : "Who won the euro 2024?" , config , }); // Print the grounded response console . log ( response . text ); REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -X POST \ -d '{ "contents": [ { "parts": [ {"text": "Who won the euro 2024?"} ] } ], "tools": [ { "google_search": {} } ] }' You can learn more by trying the Search tool notebook . How grounding with Google Search works When you enable the google_search tool, the model handles the entire workflow of searching, processing, and citing information automatically. User Prompt: Your application sends a user's prompt to the Gemini API with the google_search tool enabled. Prompt Analysis: The model analyzes the prompt and determines if a Google Search can improve the answer. Google Search: If needed, the model automatically generates one or multiple search queries and executes them. Search Results Processing: The model processes the search results, synthesizes the information, and formulates a response. Grounded Response: The API returns a final, user-friendly response that is grounded in the search results. This response includes the model's text answer and groundingMetadata with the search queries, web results, and citations. Understanding the Grounding Response When a response is successfully grounded, the response includes a groundingMetadata field. This structured data is essential for verifying claims and building a rich citation experience in your application. { "candidates" : [ { "content" : { "parts" : [ { "text" : "Spain won Euro 2024, defeating England 2-1 in the final. This victory marks Spain's record fourth European Championship title." } ], "role" : "model" }, "groundingMetadata" : { "webSearchQueries" : [ "UEFA Euro 2024 winner" , "who won euro 2024" ], "searchEntryPoint" : { "renderedContent" : "<!-- HTML and CSS for the search widget -->" }, "groundingChunks" : [ { "web" : { "uri" : "https://vertexaisearch.cloud.google.com....." , "title" : "aljazeera.com" }}, { "web" : { "uri" : "https://vertexaisearch.cloud.google.com....." , "title" : "uefa.com" }} ], "groundingSupports" : [ { "segment" : { "startIndex" : 0 , "endIndex" : 85 , "text" : "Spain won Euro 2024, defeatin..." }, "groundingChunkIndices" : [ 0 ] }, { "segment" : { "startIndex" : 86 , "endIndex" : 210 , "text" : "This victory marks Spain's..." }, "groundingChunkIndices" : [ 0 , 1 ] } ] } } ] } The Gemini API returns the following information with the groundingMetadata : webSearchQueries : Array of the search queries used. This is useful for debugging and understanding the model's reasoning process. searchEntryPoint : Contains the HTML and CSS to render the required Search Suggestions. Full usage requirements are detailed in the Terms of Service . groundingChunks : Array of objects containing the web sources ( uri and title ). groundingSupports : Array of chunks to connect model response text to the sources in groundingChunks . Each chunk links a text segment (defined by startIndex and endIndex ) to one or more groundingChunkIndices . This is the key to building inline citations. Grounding with Google Search can also be used in combination with the URL context tool to ground responses in both public web data and the specific URLs you provide. Attributing Sources with inline Citations The API returns structured citation data, giving you complete control over how you display sources in your user interface. You can use the groundingSupports and groundingChunks fields to link the model's statements directly to their sources. Here is a common pattern for processing the metadata to create a response with inline, clickable citations. Python def add_citations ( response ): text = response . text supports = response . candidates [ 0 ] . grounding_metadata . grounding_supports chunks = response . candidates [ 0 ] . grounding_metadata . grounding_chunks # Sort supports by end_index in descending order to avoid shifting issues when inserting. sorted_supports = sorted ( supports , key = lambda s : s . segment . end_index , reverse = True ) for support in sorted_supports : end_index = support . segment . end_index if support . grounding_chunk_indices : # Create citation string like [1](link1)[2](link2) citation_links = [] for i in support . grounding_chunk_indices : if i < len ( chunks ): uri = chunks [ i ] . web . uri citation_links . append ( f "[ { i + 1 } ]( { uri } )" ) citation_string = ", " . join ( citation_links ) text = text [: end_index ] + citation_string + text [ end_index :] return text # Assuming response with grounding metadata text_with_citations = add_citations ( response ) print ( text_with_citations ) JavaScript function addCitations ( response ) { let text = response . text ; const supports = response . candidates [ 0 ] ? . groundingMetadata ? . groundingSupports ; const chunks = response . candidates [ 0 ] ? . groundingMetadata ? . groundingChunks ; // Sort supports by end_index in descending order to avoid shifting issues when inserting. const sortedSupports = [... supports ]. sort ( ( a , b ) = > ( b . segment ? . endIndex ?? 0 ) - ( a . segment ? . endIndex ?? 0 ), ); for ( const support of sortedSupports ) { const endIndex = support . segment ? . endIndex ; if ( endIndex === undefined || ! support . groundingChunkIndices ? . length ) { continue ; } const citationLinks = support . groundingChunkIndices . map ( i = > { const uri = chunks [ i ] ? . web ? . uri ; if ( uri ) { return `[ ${ i + 1 } ]( ${ uri } )` ; } return null ; }) . filter ( Boolean ); if ( citationLinks . length > 0 ) { const citationString = citationLinks . join ( ", " ); text = text . slice ( 0 , endIndex ) + citationString + text . slice ( endIndex ); } } return text ; } const textWithCitations = addCitations ( response ); console . log ( textWithCitations ); The new response with inline citations will look like this: Spain won Euro 2024, defeating England 2-1 in the final.[1](https:/...), [2](https:/...), [4](https:/...), [5](https:/...) This victory marks Spain's record-breaking fourth European Championship title.[5]((https:/...), [2](https:/...), [3](https:/...), [4](https:/...) Pricing When you use Grounding with Google Search, your project is billed per API request that includes the google_search tool. If the model decides to execute multiple search queries to answer a single prompt (for example, searching for "UEFA Euro 2024 winner" and "Spain vs England Euro 2024 final score" within the same API call), this counts as a single billable use of the tool for that request. For detailed pricing information, see the Gemini API pricing page . Supported Models Experimental and Preview models are not included. You can find their capabilities on the model overview page. Model Grounding with Google Search Gemini 2.5 Pro ✔️ Gemini 2.5 Flash ✔️ Gemini 2.0 Flash ✔️ Gemini 1.5 Pro ✔️ Gemini 1.5 Flash ✔️ Note: Older models use a google_search_retrieval tool. For all current models, use the google_search tool as shown in the examples. Grounding with Gemini 1.5 Models (Legacy) While the google_search tool is recommended for Gemini 2.0 and later, Gemini 1.5 support a legacy tool named google_search_retrieval . This tool provides a dynamic mode that allows the model to decide whether to perform a search based on its confidence that the prompt requires fresh information. If the model's confidence is above a dynamic_threshold you set (a value between 0.0 and 1.0), it will perform a search. Python # Note: This is a legacy approach for Gemini 1.5 models. # The 'google_search' tool is recommended for all new development. import os from google import genai from google.genai import types client = genai . Client () retrieval_tool = types . Tool ( google_search_retrieval = types . GoogleSearchRetrieval ( dynamic_retrieval_config = types . DynamicRetrievalConfig ( mode = types . DynamicRetrievalConfigMode . MODE_DYNAMIC , dynamic_threshold = 0.7 # Only search if confidence > 70% ) ) ) config = types . GenerateContentConfig ( tools = [ retrieval_tool ] ) response = client . models . generate_content ( model = 'gemini-1.5-flash' , contents = "Who won the euro 2024?" , config = config , ) print ( response . text ) if not response . candidates [ 0 ] . grounding_metadata : print ( " \n Model answered from its own knowledge." ) JavaScript // Note: This is a legacy approach for Gemini 1.5 models. // The 'googleSearch' tool is recommended for all new development. import { GoogleGenAI , DynamicRetrievalConfigMode } from "@google/genai" ; const ai = new GoogleGenAI ({}); const retrievalTool = { googleSearchRetrieval : { dynamicRetrievalConfig : { mode : DynamicRetrievalConfigMode . MODE_DYNAMIC , dynamicThreshold : 0.7 , // Only search if confidence > 70% }, }, }; const config = { tools : [ retrievalTool ], }; const response = await ai . models . generateContent ({ model : "gemini-1.5-flash" , contents : "Who won the euro 2024?" , config , }); console . log ( response . text ); if ( ! response . candidates ? .[ 0 ] ? . groundingMetadata ) { console . log ( "\nModel answered from its own knowledge." ); } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -X POST \ -d '{ "contents": [ {"parts": [{"text": "Who won the euro 2024?"}]} ], "tools": [{ "google_search_retrieval": { "dynamic_retrieval_config": { "mode": "MODE_DYNAMIC", "dynamic_threshold": 0.7 } } }] }' What's next Try the Grounding with Google Search in the Gemini API Cookbook . Learn about other available tools, like Function Calling . Learn how to augment prompts with specific URLs using the URL context tool . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
|
text_content/docs_image-generation_15f086c8.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/image-generation#imagen-prompt-guide
|
| 2 |
+
Title: Image generation | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Image generation | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Image generation You can generate images using the Gemini API with either Gemini's built-in multimodal capabilities or Imagen, Google's specialized image generation models. For most use cases, start with Gemini . Choose Imagen for specialized tasks where image quality is critical. See Choosing the right model section for more guidance. All generated images include a SynthID watermark . Before you begin Ensure you use a supported model and version for image generation: For Gemini , use Gemini 2.0 Flash Preview Image Generation. For Imagen , use one of the Imagen models (Imagen 3, Imagen 4 or Imagen 4 Ultra). Note that those models are only available on the Paid tier . You can access both Gemini and Imagen models using the same libraries. Note: Image generation may not be available in all regions and countries, review our Models page for more information. Generate images using Gemini Gemini can generate and process images conversationally. You can prompt Gemini with text, images, or a combination of both to achieve various image-related tasks, such as image generation and editing. You must include responseModalities : ["TEXT", "IMAGE"] in your configuration. Image-only output is not supported with these models. Image generation (text-to-image) The following code demonstrates how to generate an image based on a descriptive prompt: Python from google import genai from google.genai import types from PIL import Image from io import BytesIO import base64 client = genai . Client () contents = ( 'Hi, can you create a 3d rendered image of a pig ' 'with wings and a top hat flying over a happy ' 'futuristic scifi city with lots of greenery?' ) response = client . models . generate_content ( model = "gemini-2.0-flash-preview-image-generation" , contents = contents , config = types . GenerateContentConfig ( response_modalities = [ 'TEXT' , 'IMAGE' ] ) ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) elif part . inline_data is not None : image = Image . open ( BytesIO (( part . inline_data . data ))) image . save ( 'gemini-native-image.png' ) image . show () JavaScript Note: We've released the Google SDK for TypeScript and JavaScript in preview launch stage . Use this SDK for image generation features. import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const contents = "Hi, can you create a 3d rendered image of a pig " + "with wings and a top hat flying over a happy " + "futuristic scifi city with lots of greenery?" ; // Set responseModalities to include "Image" so the model can generate an image const response = await ai . models . generateContent ({ model : "gemini-2.0-flash-preview-image-generation" , contents : contents , config : { responseModalities : [ Modality . TEXT , Modality . IMAGE ], }, }); for ( const part of response . candidates [ 0 ]. content . parts ) { // Based on the part type, either show the text or save the image if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "gemini-native-image.png" , buffer ); console . log ( "Image saved as gemini-native-image.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } config := & genai . GenerateContentConfig { ResponseModalities : [] string { "TEXT" , "IMAGE" }, } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.0-flash-preview-image-generation" , genai . Text ( "Hi, can you create a 3d rendered image of a pig " + "with wings and a top hat flying over a happy " + "futuristic scifi city with lots of greenery?" ), config , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "gemini_generated_image.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "Hi, can you create a 3d rendered image of a pig with wings and a top hat flying over a happy futuristic scifi city with lots of greenery?"} ] }], "generationConfig":{"responseModalities":["TEXT","IMAGE"]} }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > gemini-native-image.png AI-generated image of a fantastical flying pig Image editing (text-and-image-to-image) To perform image editing, add an image as input. The following example demonstrates uploading base64 encoded images. For multiple images and larger payloads, check the image input section. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO import PIL.Image image = PIL . Image . open ( '/path/to/image.png' ) client = genai . Client () text_input = ( 'Hi, This is a picture of me.' 'Can you add a llama next to me?' ,) response = client . models . generate_content ( model = "gemini-2.0-flash-preview-image-generation" , contents = [ text_input , image ], config = types . GenerateContentConfig ( response_modalities = [ 'TEXT' , 'IMAGE' ] ) ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) elif part . inline_data is not None : image = Image . open ( BytesIO (( part . inline_data . data ))) image . show () JavaScript Note: We've released the Google SDK for TypeScript and JavaScript in preview launch stage . Use this SDK for image generation features. import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); // Load the image from the local file system const imagePath = "path/to/image.png" ; const imageData = fs . readFileSync ( imagePath ); const base64Image = imageData . toString ( "base64" ); // Prepare the content parts const contents = [ { text : "Can you add a llama next to the image?" }, { inlineData : { mimeType : "image/png" , data : base64Image , }, }, ]; // Set responseModalities to include "Image" so the model can generate an image const response = await ai . models . generateContent ({ model : "gemini-2.0-flash-preview-image-generation" , contents : contents , config : { responseModalities : [ Modality . TEXT , Modality . IMAGE ], }, }); for ( const part of response . candidates [ 0 ]. content . parts ) { // Based on the part type, either show the text or save the image if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "gemini-native-image.png" , buffer ); console . log ( "Image saved as gemini-native-image.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imagePath := "/path/to/image.png" imgData , _ := os . ReadFile ( imagePath ) parts := [] * genai . Part { genai . NewPartFromText ( "Hi, This is a picture of me. Can you add a llama next to me?" ), & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData , }, }, } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } config := & genai . GenerateContentConfig { ResponseModalities : [] string { "TEXT" , "IMAGE" }, } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.0-flash-preview-image-generation" , contents , config , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "gemini_generated_image.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH = /path/to/your/image1.jpeg if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ {\"text\": \"'Hi, This is a picture of me. Can you add a llama next to me\"}, { \"inline_data\": { \"mime_type\":\"image/jpeg\", \"data\": \" $IMG_BASE64 \" } } ] }], \"generationConfig\": {\"responseModalities\": [\"TEXT\", \"IMAGE\"]} }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > gemini-edited-image.png Other image generation modes Gemini supports other image interaction modes based on prompt structure and context, including: Text to image(s) and text (interleaved): Outputs images with related text. Example prompt: "Generate an illustrated recipe for a paella." Image(s) and text to image(s) and text (interleaved) : Uses input images and text to create new related images and text. Example prompt: (With an image of a furnished room) "What other color sofas would work in my space? can you update the image?" Multi-turn image editing (chat): Keep generating / editing images conversationally. Example prompts: [upload an image of a blue car.] , "Turn this car into a convertible.", "Now change the color to yellow." Limitations For best performance, use the following languages: EN, es-MX, ja-JP, zh-CN, hi-IN. Image generation does not support audio or video inputs. Image generation may not always trigger: The model may output text only. Try asking for image outputs explicitly (e.g. "generate an image", "provide images as you go along", "update the image"). The model may stop generating partway through. Try again or try a different prompt. When generating text for an image, Gemini works best if you first generate the text and then ask for an image with the text. There are some regions/countries where Image generation is not available. See Models for more information. Generate images using the Imagen models This example demonstrates generating images with an Imagen model : Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () response = client . models . generate_images ( model = 'imagen-4.0-generate-preview-06-06' , prompt = 'Robot holding a red skateboard' , config = types . GenerateImagesConfig ( number_of_images = 4 , ) ) for generated_image in response . generated_images : generated_image . image . show () JavaScript import { GoogleGenAI } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const response = await ai . models . generateImages ({ model : 'imagen-4.0-generate-preview-06-06' , prompt : 'Robot holding a red skateboard' , config : { numberOfImages : 4 , }, }); let idx = 1 ; for ( const generatedImage of response . generatedImages ) { let imgBytes = generatedImage . image . imageBytes ; const buffer = Buffer . from ( imgBytes , "base64" ); fs . writeFileSync ( `imagen- ${ idx } .png` , buffer ); idx ++ ; } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } config := & genai . GenerateImagesConfig { NumberOfImages : 4 , } response , _ := client . Models . GenerateImages ( ctx , "imagen-4.0-generate-preview-06-06" , "Robot holding a red skateboard" , config , ) for n , image := range response . GeneratedImages { fname := fmt . Sprintf ( "imagen-%d.png" , n ) _ = os . WriteFile ( fname , image . Image . ImageBytes , 0644 ) } } REST curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/imagen-4.0-generate-preview-06-06:predict" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "instances": [ { "prompt": "Robot holding a red skateboard" } ], "parameters": { "sampleCount": 4 } }' AI-generated image of a robot holding a red skateboard Imagen configuration Imagen supports English only prompts at this time and the following parameters: Note: Naming conventions of parameters vary by programming language. numberOfImages : The number of images to generate, from 1 to 4 (inclusive). The default is 4. For Imagen 4 Ultra, it defaults to 1 as only one image can be generated at a time. aspectRatio : Changes the aspect ratio of the generated image. Supported values are "1:1" , "3:4" , "4:3" , "9:16" , and "16:9" . The default is "1:1" . personGeneration : Allow the model to generate images of people. The following values are supported: "dont_allow" : Block generation of images of people. "allow_adult" : Generate images of adults, but not children. This is the default. "allow_all" : Generate images that include adults and children. Note: The "allow_all" parameter value is not allowed in EU, UK, CH, MENA locations. Choosing the right model Choose Gemini when: You need contextually relevant images that leverage world knowledge and reasoning. Seamlessly blending text and images is important. You want accurate visuals embedded within long text sequences. You want to edit images conversationally while maintaining context. Choose Imagen when: Image quality, photorealism, artistic detail, or specific styles (e.g., impressionism, anime) are top priorities. Performing specialized editing tasks like product background updates or image upscaling. Infusing branding, style, or generating logos and product designs. Imagen 4 should be your go-to model starting to generate images with Imagen. Choose Imagen 4 Ultra for advanced use-cases or when you need the best image quality. Note that Imagen 4 Ultra can only generate one image at a time. Imagen prompt guide This section of the Imagen guide shows you how modifying a text-to-image prompt can produce different results, along with examples of images you can create. Prompt writing basics Note: Maximum prompt length is 480 tokens. A good prompt is descriptive and clear, and makes use of meaningful keywords and modifiers. Start by thinking of your subject , context , and style . Image text: A sketch ( style ) of a modern apartment building ( subject ) surrounded by skyscrapers ( context and background ). Subject : The first thing to think about with any prompt is the subject : the object, person, animal, or scenery you want an image of. Context and background: Just as important is the background or context in which the subject will be placed. Try placing your subject in a variety of backgrounds. For example, a studio with a white background, outdoors, or indoor environments. Style: Finally, add the style of image you want. Styles can be general (painting, photograph, sketches) or very specific (pastel painting, charcoal drawing, isometric 3D). You can also combine styles. After you write a first version of your prompt, refine your prompt by adding more details until you get to the image that you want. Iteration is important. Start by establishing your core idea, and then refine and expand upon that core idea until the generated image is close to your vision. Prompt: A park in the spring next to a lake Prompt: A park in the spring next to a lake, the sun sets across the lake, golden hour Prompt: A park in the spring next to a lake, the sun sets across the lake, golden hour, red wildflowers Imagen models can transform your ideas into detailed images, whether your prompts are short or long and detailed. Refine your vision through iterative prompting, adding details until you achieve the perfect result. Short prompts let you generate an image quickly. Prompt: close-up photo of a woman in her 20s, street photography, movie still, muted orange warm tones Longer prompts let you add specific details and build your image. Prompt: captivating photo of a woman in her 20s utilizing a street photography style. The image should look like a movie still with muted orange warm tones. Additional advice for Imagen prompt writing: Use descriptive language : Employ detailed adjectives and adverbs to paint a clear picture for Imagen. Provide context : If necessary, include background information to aid the AI's understanding. Reference specific artists or styles : If you have a particular aesthetic in mind, referencing specific artists or art movements can be helpful. Use prompt engineering tools : Consider exploring prompt engineering tools or resources to help you refine your prompts and achieve optimal results. Enhancing the facial details in your personal and group images : Specify facial details as a focus of the photo (for example, use the word "portrait" in the prompt). Generate text in images Imagen models can add text into images, opening up more creative image generation possibilities. Use the following guidance to get the most out of this feature: Iterate with confidence : You might have to regenerate images until you achieve the look you want. Imagen's text integration is still evolving, and sometimes multiple attempts yield the best results. Keep it short : Limit text to 25 characters or less for optimal generation. Multiple phrases : Experiment with two or three distinct phrases to provide additional information. Avoid exceeding three phrases for cleaner compositions. Prompt: A poster with the text "Summerland" in bold font as a title, underneath this text is the slogan "Summer never felt so good" Guide Placement : While Imagen can attempt to position text as directed, expect occasional variations. This feature is continually improving. Inspire font style : Specify a general font style to subtly influence Imagen's choices. Don't rely on precise font replication, but expect creative interpretations. Font size : Specify a font size or a general indication of size (for example, small , medium , large ) to influence the font size generation. Prompt parameterization To better control output results, you might find it helpful to parameterize the inputs into Imagen. For example, suppose you want your customers to be able to generate logos for their business, and you want to make sure logos are always generated on a solid color background. You also want to limit the options that the client can select from a menu. In this example, you can create a parameterized prompt similar to the following: A {logo_style} logo for a {company_area} company on a solid color background. Include the text {company_name} . In your custom user interface, the customer can input the parameters using a menu, and their chosen value populates the prompt Imagen receives. For example: Prompt: A minimalist logo for a health care company on a solid color background. Include the text Journey . Prompt: A modern logo for a software company on a solid color background. Include the text Silo . Prompt: A traditional logo for a baking company on a solid color background. Include the text Seed . Advanced prompt writing techniques Use the following examples to create more specific prompts based on attributes like photography descriptors, shapes and materials, historical art movements, and image quality modifiers. Photography Prompt includes: "A photo of..." To use this style, start with using keywords that clearly tell Imagen that you're looking for a photograph. Start your prompts with "A photo of. . ." . For example: Prompt: A photo of coffee beans in a kitchen on a wooden surface Prompt: A photo of a chocolate bar on a kitchen counter Prompt: A photo of a modern building with water in the background Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Photography modifiers In the following examples, you can see several photography-specific modifiers and parameters. You can combine multiple modifiers for more precise control. Camera Proximity - Close up, taken from far away Prompt: A close-up photo of coffee beans Prompt: A zoomed out photo of a small bag of coffee beans in a messy kitchen Camera Position - aerial, from below Prompt: aerial photo of urban city with skyscrapers Prompt: A photo of a forest canopy with blue skies from below Lighting - natural, dramatic, warm, cold Prompt: studio photo of a modern arm chair, natural lighting Prompt: studio photo of a modern arm chair, dramatic lighting Camera Settings - motion blur, soft focus, bokeh, portrait Prompt: photo of a city with skyscrapers from the inside of a car with motion blur Prompt: soft focus photograph of a bridge in an urban city at night Lens types - 35mm, 50mm, fisheye, wide angle, macro Prompt: photo of a leaf, macro lens Prompt: street photography, new york city, fisheye lens Film types - black and white, polaroid Prompt: a polaroid portrait of a dog wearing sunglasses Prompt: black and white photo of a dog wearing sunglasses Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Illustration and art Prompt includes: "A painting of..." , "A sketch of..." Art styles vary from monochrome styles like pencil sketches, to hyper-realistic digital art. For example, the following images use the same prompt with different styles: "An [art style or creation technique] of an angular sporty electric sedan with skyscrapers in the background" Prompt: A technical pencil drawing of an angular... Prompt: A charcoal drawing of an angular... Prompt: A color pencil drawing of an angular... Prompt: A pastel painting of an angular... Prompt: A digital art of an angular... Prompt: An art deco (poster) of an angular... Image source: Each image was generated using its corresponding text prompt with the Imagen 2 model. Shapes and materials Prompt includes: "...made of..." , "...in the shape of..." One of the strengths of this technology is that you can create imagery that is otherwise difficult or impossible. For example, you can recreate your company logo in different materials and textures. Prompt: a duffle bag made of cheese Prompt: neon tubes in the shape of a bird Prompt: an armchair made of paper , studio photo, origami style Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Historical art references Prompt includes: "...in the style of..." Certain styles have become iconic over the years. The following are some ideas of historical painting or art styles that you can try. "generate an image in the style of [art period or movement] : a wind farm" Prompt: generate an image in the style of an impressionist painting : a wind farm Prompt: generate an image in the style of a renaissance painting : a wind farm Prompt: generate an image in the style of pop art : a wind farm Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Image quality modifiers Certain keywords can let the model know that you're looking for a high-quality asset. Examples of quality modifiers include the following: General Modifiers - high-quality, beautiful, stylized Photos - 4K, HDR, Studio Photo Art, Illustration - by a professional, detailed The following are a few examples of prompts without quality modifiers and the same prompt with quality modifiers. Prompt (no quality modifiers): a photo of a corn stalk Prompt (with quality modifiers): 4k HDR beautiful photo of a corn stalk taken by a professional photographer Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Aspect ratios Imagen image generation lets you set five distinct image aspect ratios. Square (1:1, default) - A standard square photo. Common uses for this aspect ratio include social media posts. Fullscreen (4:3) - This aspect ratio is commonly used in media or film. It is also the dimensions of most old (non-widescreen) TVs and medium format cameras. It captures more of the scene horizontally (compared to 1:1), making it a preferred aspect ratio for photography. Prompt: close up of a musician's fingers playing the piano, black and white film, vintage (4:3 aspect ratio) Prompt: A professional studio photo of french fries for a high end restaurant, in the style of a food magazine (4:3 aspect ratio) Portrait full screen (3:4) - This is the fullscreen aspect ratio rotated 90 degrees. This lets to capture more of the scene vertically compared to the 1:1 aspect ratio. Prompt: a woman hiking, close of her boots reflected in a puddle, large mountains in the background, in the style of an advertisement, dramatic angles (3:4 aspect ratio) Prompt: aerial shot of a river flowing up a mystical valley (3:4 aspect ratio) Widescreen (16:9) - This ratio has replaced 4:3 and is now the most common aspect ratio for TVs, monitors, and mobile phone screens (landscape). Use this aspect ratio when you want to capture more of the background (for example, scenic landscapes). Prompt: a man wearing all white clothing sitting on the beach, close up, golden hour lighting (16:9 aspect ratio) Portrait (9:16) - This ratio is widescreen but rotated. This a relatively new aspect ratio that has been popularized by short form video apps (for example, YouTube shorts). Use this for tall objects with strong vertical orientations such as buildings, trees, waterfalls, or other similar objects. Prompt: a digital render of a massive skyscraper, modern, grand, epic with a beautiful sunset in the background (9:16 aspect ratio) Photorealistic images Different versions of the image generation model might offer a mix of artistic and photorealistic output. Use the following wording in prompts to generate more photorealistic output, based on the subject you want to generate. Note: Take these keywords as general guidance when you try to create photorealistic images. They aren't required to achieve your goal. Use case Lens type Focal lengths Additional details People (portraits) Prime, zoom 24-35mm black and white film, Film noir, Depth of field, duotone (mention two colors) Food, insects, plants (objects, still life) Macro 60-105mm High detail, precise focusing, controlled lighting Sports, wildlife (motion) Telephoto zoom 100-400mm Fast shutter speed, Action or movement tracking Astronomical, landscape (wide-angle) Wide-angle 10-24mm Long exposure times, sharp focus, long exposure, smooth water or clouds Portraits Use case Lens type Focal lengths Additional details People (portraits) Prime, zoom 24-35mm black and white film, Film noir, Depth of field, duotone (mention two colors) Using several keywords from the table, Imagen can generate the following portraits: Prompt: A woman, 35mm portrait, blue and grey duotones Model: imagen-3.0-generate-002 Prompt: A woman, 35mm portrait, film noir Model: imagen-3.0-generate-002 Objects Use case Lens type Focal lengths Additional details Food, insects, plants (objects, still life) Macro 60-105mm High detail, precise focusing, controlled lighting Using several keywords from the table, Imagen can generate the following object images: Prompt: leaf of a prayer plant, macro lens, 60mm Model: imagen-3.0-generate-002 Prompt: a plate of pasta, 100mm Macro lens Model: imagen-3.0-generate-002 Motion Use case Lens type Focal lengths Additional details Sports, wildlife (motion) Telephoto zoom 100-400mm Fast shutter speed, Action or movement tracking Using several keywords from the table, Imagen can generate the following motion images: Prompt: a winning touchdown, fast shutter speed, movement tracking Model: imagen-3.0-generate-002 Prompt: A deer running in the forest, fast shutter speed, movement tracking Model: imagen-3.0-generate-002 Wide-angle Use case Lens type Focal lengths Additional details Astronomical, landscape (wide-angle) Wide-angle 10-24mm Long exposure times, sharp focus, long exposure, smooth water or clouds Using several keywords from the table, Imagen can generate the following wide-angle images: Prompt: an expansive mountain range, landscape wide angle 10mm Model: imagen-3.0-generate-002 Prompt: a photo of the moon, astro photography, wide angle 10mm Model: imagen-3.0-generate-002 What's next Check out the Veo guide to learn how to generate videos with the Gemini API. To learn more about Gemini models, see Gemini models and Experimental models . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
|
text_content/docs_image-generation_55547714.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/image-generation
|
| 2 |
+
Title: Image generation | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Image generation | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Image generation You can generate images using the Gemini API with either Gemini's built-in multimodal capabilities or Imagen, Google's specialized image generation models. For most use cases, start with Gemini . Choose Imagen for specialized tasks where image quality is critical. See Choosing the right model section for more guidance. All generated images include a SynthID watermark . Before you begin Ensure you use a supported model and version for image generation: For Gemini , use Gemini 2.0 Flash Preview Image Generation. For Imagen , use one of the Imagen models (Imagen 3, Imagen 4 or Imagen 4 Ultra). Note that those models are only available on the Paid tier . You can access both Gemini and Imagen models using the same libraries. Note: Image generation may not be available in all regions and countries, review our Models page for more information. Generate images using Gemini Gemini can generate and process images conversationally. You can prompt Gemini with text, images, or a combination of both to achieve various image-related tasks, such as image generation and editing. You must include responseModalities : ["TEXT", "IMAGE"] in your configuration. Image-only output is not supported with these models. Image generation (text-to-image) The following code demonstrates how to generate an image based on a descriptive prompt: Python from google import genai from google.genai import types from PIL import Image from io import BytesIO import base64 client = genai . Client () contents = ( 'Hi, can you create a 3d rendered image of a pig ' 'with wings and a top hat flying over a happy ' 'futuristic scifi city with lots of greenery?' ) response = client . models . generate_content ( model = "gemini-2.0-flash-preview-image-generation" , contents = contents , config = types . GenerateContentConfig ( response_modalities = [ 'TEXT' , 'IMAGE' ] ) ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) elif part . inline_data is not None : image = Image . open ( BytesIO (( part . inline_data . data ))) image . save ( 'gemini-native-image.png' ) image . show () JavaScript Note: We've released the Google SDK for TypeScript and JavaScript in preview launch stage . Use this SDK for image generation features. import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const contents = "Hi, can you create a 3d rendered image of a pig " + "with wings and a top hat flying over a happy " + "futuristic scifi city with lots of greenery?" ; // Set responseModalities to include "Image" so the model can generate an image const response = await ai . models . generateContent ({ model : "gemini-2.0-flash-preview-image-generation" , contents : contents , config : { responseModalities : [ Modality . TEXT , Modality . IMAGE ], }, }); for ( const part of response . candidates [ 0 ]. content . parts ) { // Based on the part type, either show the text or save the image if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "gemini-native-image.png" , buffer ); console . log ( "Image saved as gemini-native-image.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } config := & genai . GenerateContentConfig { ResponseModalities : [] string { "TEXT" , "IMAGE" }, } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.0-flash-preview-image-generation" , genai . Text ( "Hi, can you create a 3d rendered image of a pig " + "with wings and a top hat flying over a happy " + "futuristic scifi city with lots of greenery?" ), config , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "gemini_generated_image.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "Hi, can you create a 3d rendered image of a pig with wings and a top hat flying over a happy futuristic scifi city with lots of greenery?"} ] }], "generationConfig":{"responseModalities":["TEXT","IMAGE"]} }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > gemini-native-image.png AI-generated image of a fantastical flying pig Image editing (text-and-image-to-image) To perform image editing, add an image as input. The following example demonstrates uploading base64 encoded images. For multiple images and larger payloads, check the image input section. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO import PIL.Image image = PIL . Image . open ( '/path/to/image.png' ) client = genai . Client () text_input = ( 'Hi, This is a picture of me.' 'Can you add a llama next to me?' ,) response = client . models . generate_content ( model = "gemini-2.0-flash-preview-image-generation" , contents = [ text_input , image ], config = types . GenerateContentConfig ( response_modalities = [ 'TEXT' , 'IMAGE' ] ) ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) elif part . inline_data is not None : image = Image . open ( BytesIO (( part . inline_data . data ))) image . show () JavaScript Note: We've released the Google SDK for TypeScript and JavaScript in preview launch stage . Use this SDK for image generation features. import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); // Load the image from the local file system const imagePath = "path/to/image.png" ; const imageData = fs . readFileSync ( imagePath ); const base64Image = imageData . toString ( "base64" ); // Prepare the content parts const contents = [ { text : "Can you add a llama next to the image?" }, { inlineData : { mimeType : "image/png" , data : base64Image , }, }, ]; // Set responseModalities to include "Image" so the model can generate an image const response = await ai . models . generateContent ({ model : "gemini-2.0-flash-preview-image-generation" , contents : contents , config : { responseModalities : [ Modality . TEXT , Modality . IMAGE ], }, }); for ( const part of response . candidates [ 0 ]. content . parts ) { // Based on the part type, either show the text or save the image if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "gemini-native-image.png" , buffer ); console . log ( "Image saved as gemini-native-image.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imagePath := "/path/to/image.png" imgData , _ := os . ReadFile ( imagePath ) parts := [] * genai . Part { genai . NewPartFromText ( "Hi, This is a picture of me. Can you add a llama next to me?" ), & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData , }, }, } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } config := & genai . GenerateContentConfig { ResponseModalities : [] string { "TEXT" , "IMAGE" }, } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.0-flash-preview-image-generation" , contents , config , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "gemini_generated_image.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH = /path/to/your/image1.jpeg if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ {\"text\": \"'Hi, This is a picture of me. Can you add a llama next to me\"}, { \"inline_data\": { \"mime_type\":\"image/jpeg\", \"data\": \" $IMG_BASE64 \" } } ] }], \"generationConfig\": {\"responseModalities\": [\"TEXT\", \"IMAGE\"]} }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > gemini-edited-image.png Other image generation modes Gemini supports other image interaction modes based on prompt structure and context, including: Text to image(s) and text (interleaved): Outputs images with related text. Example prompt: "Generate an illustrated recipe for a paella." Image(s) and text to image(s) and text (interleaved) : Uses input images and text to create new related images and text. Example prompt: (With an image of a furnished room) "What other color sofas would work in my space? can you update the image?" Multi-turn image editing (chat): Keep generating / editing images conversationally. Example prompts: [upload an image of a blue car.] , "Turn this car into a convertible.", "Now change the color to yellow." Limitations For best performance, use the following languages: EN, es-MX, ja-JP, zh-CN, hi-IN. Image generation does not support audio or video inputs. Image generation may not always trigger: The model may output text only. Try asking for image outputs explicitly (e.g. "generate an image", "provide images as you go along", "update the image"). The model may stop generating partway through. Try again or try a different prompt. When generating text for an image, Gemini works best if you first generate the text and then ask for an image with the text. There are some regions/countries where Image generation is not available. See Models for more information. Generate images using the Imagen models This example demonstrates generating images with an Imagen model : Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () response = client . models . generate_images ( model = 'imagen-4.0-generate-preview-06-06' , prompt = 'Robot holding a red skateboard' , config = types . GenerateImagesConfig ( number_of_images = 4 , ) ) for generated_image in response . generated_images : generated_image . image . show () JavaScript import { GoogleGenAI } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const response = await ai . models . generateImages ({ model : 'imagen-4.0-generate-preview-06-06' , prompt : 'Robot holding a red skateboard' , config : { numberOfImages : 4 , }, }); let idx = 1 ; for ( const generatedImage of response . generatedImages ) { let imgBytes = generatedImage . image . imageBytes ; const buffer = Buffer . from ( imgBytes , "base64" ); fs . writeFileSync ( `imagen- ${ idx } .png` , buffer ); idx ++ ; } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } config := & genai . GenerateImagesConfig { NumberOfImages : 4 , } response , _ := client . Models . GenerateImages ( ctx , "imagen-4.0-generate-preview-06-06" , "Robot holding a red skateboard" , config , ) for n , image := range response . GeneratedImages { fname := fmt . Sprintf ( "imagen-%d.png" , n ) _ = os . WriteFile ( fname , image . Image . ImageBytes , 0644 ) } } REST curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/imagen-4.0-generate-preview-06-06:predict" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "instances": [ { "prompt": "Robot holding a red skateboard" } ], "parameters": { "sampleCount": 4 } }' AI-generated image of a robot holding a red skateboard Imagen configuration Imagen supports English only prompts at this time and the following parameters: Note: Naming conventions of parameters vary by programming language. numberOfImages : The number of images to generate, from 1 to 4 (inclusive). The default is 4. For Imagen 4 Ultra, it defaults to 1 as only one image can be generated at a time. aspectRatio : Changes the aspect ratio of the generated image. Supported values are "1:1" , "3:4" , "4:3" , "9:16" , and "16:9" . The default is "1:1" . personGeneration : Allow the model to generate images of people. The following values are supported: "dont_allow" : Block generation of images of people. "allow_adult" : Generate images of adults, but not children. This is the default. "allow_all" : Generate images that include adults and children. Note: The "allow_all" parameter value is not allowed in EU, UK, CH, MENA locations. Choosing the right model Choose Gemini when: You need contextually relevant images that leverage world knowledge and reasoning. Seamlessly blending text and images is important. You want accurate visuals embedded within long text sequences. You want to edit images conversationally while maintaining context. Choose Imagen when: Image quality, photorealism, artistic detail, or specific styles (e.g., impressionism, anime) are top priorities. Performing specialized editing tasks like product background updates or image upscaling. Infusing branding, style, or generating logos and product designs. Imagen 4 should be your go-to model starting to generate images with Imagen. Choose Imagen 4 Ultra for advanced use-cases or when you need the best image quality. Note that Imagen 4 Ultra can only generate one image at a time. Imagen prompt guide This section of the Imagen guide shows you how modifying a text-to-image prompt can produce different results, along with examples of images you can create. Prompt writing basics Note: Maximum prompt length is 480 tokens. A good prompt is descriptive and clear, and makes use of meaningful keywords and modifiers. Start by thinking of your subject , context , and style . Image text: A sketch ( style ) of a modern apartment building ( subject ) surrounded by skyscrapers ( context and background ). Subject : The first thing to think about with any prompt is the subject : the object, person, animal, or scenery you want an image of. Context and background: Just as important is the background or context in which the subject will be placed. Try placing your subject in a variety of backgrounds. For example, a studio with a white background, outdoors, or indoor environments. Style: Finally, add the style of image you want. Styles can be general (painting, photograph, sketches) or very specific (pastel painting, charcoal drawing, isometric 3D). You can also combine styles. After you write a first version of your prompt, refine your prompt by adding more details until you get to the image that you want. Iteration is important. Start by establishing your core idea, and then refine and expand upon that core idea until the generated image is close to your vision. Prompt: A park in the spring next to a lake Prompt: A park in the spring next to a lake, the sun sets across the lake, golden hour Prompt: A park in the spring next to a lake, the sun sets across the lake, golden hour, red wildflowers Imagen models can transform your ideas into detailed images, whether your prompts are short or long and detailed. Refine your vision through iterative prompting, adding details until you achieve the perfect result. Short prompts let you generate an image quickly. Prompt: close-up photo of a woman in her 20s, street photography, movie still, muted orange warm tones Longer prompts let you add specific details and build your image. Prompt: captivating photo of a woman in her 20s utilizing a street photography style. The image should look like a movie still with muted orange warm tones. Additional advice for Imagen prompt writing: Use descriptive language : Employ detailed adjectives and adverbs to paint a clear picture for Imagen. Provide context : If necessary, include background information to aid the AI's understanding. Reference specific artists or styles : If you have a particular aesthetic in mind, referencing specific artists or art movements can be helpful. Use prompt engineering tools : Consider exploring prompt engineering tools or resources to help you refine your prompts and achieve optimal results. Enhancing the facial details in your personal and group images : Specify facial details as a focus of the photo (for example, use the word "portrait" in the prompt). Generate text in images Imagen models can add text into images, opening up more creative image generation possibilities. Use the following guidance to get the most out of this feature: Iterate with confidence : You might have to regenerate images until you achieve the look you want. Imagen's text integration is still evolving, and sometimes multiple attempts yield the best results. Keep it short : Limit text to 25 characters or less for optimal generation. Multiple phrases : Experiment with two or three distinct phrases to provide additional information. Avoid exceeding three phrases for cleaner compositions. Prompt: A poster with the text "Summerland" in bold font as a title, underneath this text is the slogan "Summer never felt so good" Guide Placement : While Imagen can attempt to position text as directed, expect occasional variations. This feature is continually improving. Inspire font style : Specify a general font style to subtly influence Imagen's choices. Don't rely on precise font replication, but expect creative interpretations. Font size : Specify a font size or a general indication of size (for example, small , medium , large ) to influence the font size generation. Prompt parameterization To better control output results, you might find it helpful to parameterize the inputs into Imagen. For example, suppose you want your customers to be able to generate logos for their business, and you want to make sure logos are always generated on a solid color background. You also want to limit the options that the client can select from a menu. In this example, you can create a parameterized prompt similar to the following: A {logo_style} logo for a {company_area} company on a solid color background. Include the text {company_name} . In your custom user interface, the customer can input the parameters using a menu, and their chosen value populates the prompt Imagen receives. For example: Prompt: A minimalist logo for a health care company on a solid color background. Include the text Journey . Prompt: A modern logo for a software company on a solid color background. Include the text Silo . Prompt: A traditional logo for a baking company on a solid color background. Include the text Seed . Advanced prompt writing techniques Use the following examples to create more specific prompts based on attributes like photography descriptors, shapes and materials, historical art movements, and image quality modifiers. Photography Prompt includes: "A photo of..." To use this style, start with using keywords that clearly tell Imagen that you're looking for a photograph. Start your prompts with "A photo of. . ." . For example: Prompt: A photo of coffee beans in a kitchen on a wooden surface Prompt: A photo of a chocolate bar on a kitchen counter Prompt: A photo of a modern building with water in the background Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Photography modifiers In the following examples, you can see several photography-specific modifiers and parameters. You can combine multiple modifiers for more precise control. Camera Proximity - Close up, taken from far away Prompt: A close-up photo of coffee beans Prompt: A zoomed out photo of a small bag of coffee beans in a messy kitchen Camera Position - aerial, from below Prompt: aerial photo of urban city with skyscrapers Prompt: A photo of a forest canopy with blue skies from below Lighting - natural, dramatic, warm, cold Prompt: studio photo of a modern arm chair, natural lighting Prompt: studio photo of a modern arm chair, dramatic lighting Camera Settings - motion blur, soft focus, bokeh, portrait Prompt: photo of a city with skyscrapers from the inside of a car with motion blur Prompt: soft focus photograph of a bridge in an urban city at night Lens types - 35mm, 50mm, fisheye, wide angle, macro Prompt: photo of a leaf, macro lens Prompt: street photography, new york city, fisheye lens Film types - black and white, polaroid Prompt: a polaroid portrait of a dog wearing sunglasses Prompt: black and white photo of a dog wearing sunglasses Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Illustration and art Prompt includes: "A painting of..." , "A sketch of..." Art styles vary from monochrome styles like pencil sketches, to hyper-realistic digital art. For example, the following images use the same prompt with different styles: "An [art style or creation technique] of an angular sporty electric sedan with skyscrapers in the background" Prompt: A technical pencil drawing of an angular... Prompt: A charcoal drawing of an angular... Prompt: A color pencil drawing of an angular... Prompt: A pastel painting of an angular... Prompt: A digital art of an angular... Prompt: An art deco (poster) of an angular... Image source: Each image was generated using its corresponding text prompt with the Imagen 2 model. Shapes and materials Prompt includes: "...made of..." , "...in the shape of..." One of the strengths of this technology is that you can create imagery that is otherwise difficult or impossible. For example, you can recreate your company logo in different materials and textures. Prompt: a duffle bag made of cheese Prompt: neon tubes in the shape of a bird Prompt: an armchair made of paper , studio photo, origami style Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Historical art references Prompt includes: "...in the style of..." Certain styles have become iconic over the years. The following are some ideas of historical painting or art styles that you can try. "generate an image in the style of [art period or movement] : a wind farm" Prompt: generate an image in the style of an impressionist painting : a wind farm Prompt: generate an image in the style of a renaissance painting : a wind farm Prompt: generate an image in the style of pop art : a wind farm Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Image quality modifiers Certain keywords can let the model know that you're looking for a high-quality asset. Examples of quality modifiers include the following: General Modifiers - high-quality, beautiful, stylized Photos - 4K, HDR, Studio Photo Art, Illustration - by a professional, detailed The following are a few examples of prompts without quality modifiers and the same prompt with quality modifiers. Prompt (no quality modifiers): a photo of a corn stalk Prompt (with quality modifiers): 4k HDR beautiful photo of a corn stalk taken by a professional photographer Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Aspect ratios Imagen image generation lets you set five distinct image aspect ratios. Square (1:1, default) - A standard square photo. Common uses for this aspect ratio include social media posts. Fullscreen (4:3) - This aspect ratio is commonly used in media or film. It is also the dimensions of most old (non-widescreen) TVs and medium format cameras. It captures more of the scene horizontally (compared to 1:1), making it a preferred aspect ratio for photography. Prompt: close up of a musician's fingers playing the piano, black and white film, vintage (4:3 aspect ratio) Prompt: A professional studio photo of french fries for a high end restaurant, in the style of a food magazine (4:3 aspect ratio) Portrait full screen (3:4) - This is the fullscreen aspect ratio rotated 90 degrees. This lets to capture more of the scene vertically compared to the 1:1 aspect ratio. Prompt: a woman hiking, close of her boots reflected in a puddle, large mountains in the background, in the style of an advertisement, dramatic angles (3:4 aspect ratio) Prompt: aerial shot of a river flowing up a mystical valley (3:4 aspect ratio) Widescreen (16:9) - This ratio has replaced 4:3 and is now the most common aspect ratio for TVs, monitors, and mobile phone screens (landscape). Use this aspect ratio when you want to capture more of the background (for example, scenic landscapes). Prompt: a man wearing all white clothing sitting on the beach, close up, golden hour lighting (16:9 aspect ratio) Portrait (9:16) - This ratio is widescreen but rotated. This a relatively new aspect ratio that has been popularized by short form video apps (for example, YouTube shorts). Use this for tall objects with strong vertical orientations such as buildings, trees, waterfalls, or other similar objects. Prompt: a digital render of a massive skyscraper, modern, grand, epic with a beautiful sunset in the background (9:16 aspect ratio) Photorealistic images Different versions of the image generation model might offer a mix of artistic and photorealistic output. Use the following wording in prompts to generate more photorealistic output, based on the subject you want to generate. Note: Take these keywords as general guidance when you try to create photorealistic images. They aren't required to achieve your goal. Use case Lens type Focal lengths Additional details People (portraits) Prime, zoom 24-35mm black and white film, Film noir, Depth of field, duotone (mention two colors) Food, insects, plants (objects, still life) Macro 60-105mm High detail, precise focusing, controlled lighting Sports, wildlife (motion) Telephoto zoom 100-400mm Fast shutter speed, Action or movement tracking Astronomical, landscape (wide-angle) Wide-angle 10-24mm Long exposure times, sharp focus, long exposure, smooth water or clouds Portraits Use case Lens type Focal lengths Additional details People (portraits) Prime, zoom 24-35mm black and white film, Film noir, Depth of field, duotone (mention two colors) Using several keywords from the table, Imagen can generate the following portraits: Prompt: A woman, 35mm portrait, blue and grey duotones Model: imagen-3.0-generate-002 Prompt: A woman, 35mm portrait, film noir Model: imagen-3.0-generate-002 Objects Use case Lens type Focal lengths Additional details Food, insects, plants (objects, still life) Macro 60-105mm High detail, precise focusing, controlled lighting Using several keywords from the table, Imagen can generate the following object images: Prompt: leaf of a prayer plant, macro lens, 60mm Model: imagen-3.0-generate-002 Prompt: a plate of pasta, 100mm Macro lens Model: imagen-3.0-generate-002 Motion Use case Lens type Focal lengths Additional details Sports, wildlife (motion) Telephoto zoom 100-400mm Fast shutter speed, Action or movement tracking Using several keywords from the table, Imagen can generate the following motion images: Prompt: a winning touchdown, fast shutter speed, movement tracking Model: imagen-3.0-generate-002 Prompt: A deer running in the forest, fast shutter speed, movement tracking Model: imagen-3.0-generate-002 Wide-angle Use case Lens type Focal lengths Additional details Astronomical, landscape (wide-angle) Wide-angle 10-24mm Long exposure times, sharp focus, long exposure, smooth water or clouds Using several keywords from the table, Imagen can generate the following wide-angle images: Prompt: an expansive mountain range, landscape wide angle 10mm Model: imagen-3.0-generate-002 Prompt: a photo of the moon, astro photography, wide angle 10mm Model: imagen-3.0-generate-002 What's next Check out the Veo guide to learn how to generate videos with the Gemini API. To learn more about Gemini models, see Gemini models and Experimental models . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
|
text_content/docs_image-generation_56ff9b7f.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/image-generation#imagen
|
| 2 |
+
Title: Image generation | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Image generation | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Image generation You can generate images using the Gemini API with either Gemini's built-in multimodal capabilities or Imagen, Google's specialized image generation models. For most use cases, start with Gemini . Choose Imagen for specialized tasks where image quality is critical. See Choosing the right model section for more guidance. All generated images include a SynthID watermark . Before you begin Ensure you use a supported model and version for image generation: For Gemini , use Gemini 2.0 Flash Preview Image Generation. For Imagen , use one of the Imagen models (Imagen 3, Imagen 4 or Imagen 4 Ultra). Note that those models are only available on the Paid tier . You can access both Gemini and Imagen models using the same libraries. Note: Image generation may not be available in all regions and countries, review our Models page for more information. Generate images using Gemini Gemini can generate and process images conversationally. You can prompt Gemini with text, images, or a combination of both to achieve various image-related tasks, such as image generation and editing. You must include responseModalities : ["TEXT", "IMAGE"] in your configuration. Image-only output is not supported with these models. Image generation (text-to-image) The following code demonstrates how to generate an image based on a descriptive prompt: Python from google import genai from google.genai import types from PIL import Image from io import BytesIO import base64 client = genai . Client () contents = ( 'Hi, can you create a 3d rendered image of a pig ' 'with wings and a top hat flying over a happy ' 'futuristic scifi city with lots of greenery?' ) response = client . models . generate_content ( model = "gemini-2.0-flash-preview-image-generation" , contents = contents , config = types . GenerateContentConfig ( response_modalities = [ 'TEXT' , 'IMAGE' ] ) ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) elif part . inline_data is not None : image = Image . open ( BytesIO (( part . inline_data . data ))) image . save ( 'gemini-native-image.png' ) image . show () JavaScript Note: We've released the Google SDK for TypeScript and JavaScript in preview launch stage . Use this SDK for image generation features. import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const contents = "Hi, can you create a 3d rendered image of a pig " + "with wings and a top hat flying over a happy " + "futuristic scifi city with lots of greenery?" ; // Set responseModalities to include "Image" so the model can generate an image const response = await ai . models . generateContent ({ model : "gemini-2.0-flash-preview-image-generation" , contents : contents , config : { responseModalities : [ Modality . TEXT , Modality . IMAGE ], }, }); for ( const part of response . candidates [ 0 ]. content . parts ) { // Based on the part type, either show the text or save the image if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "gemini-native-image.png" , buffer ); console . log ( "Image saved as gemini-native-image.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } config := & genai . GenerateContentConfig { ResponseModalities : [] string { "TEXT" , "IMAGE" }, } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.0-flash-preview-image-generation" , genai . Text ( "Hi, can you create a 3d rendered image of a pig " + "with wings and a top hat flying over a happy " + "futuristic scifi city with lots of greenery?" ), config , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "gemini_generated_image.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "Hi, can you create a 3d rendered image of a pig with wings and a top hat flying over a happy futuristic scifi city with lots of greenery?"} ] }], "generationConfig":{"responseModalities":["TEXT","IMAGE"]} }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > gemini-native-image.png AI-generated image of a fantastical flying pig Image editing (text-and-image-to-image) To perform image editing, add an image as input. The following example demonstrates uploading base64 encoded images. For multiple images and larger payloads, check the image input section. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO import PIL.Image image = PIL . Image . open ( '/path/to/image.png' ) client = genai . Client () text_input = ( 'Hi, This is a picture of me.' 'Can you add a llama next to me?' ,) response = client . models . generate_content ( model = "gemini-2.0-flash-preview-image-generation" , contents = [ text_input , image ], config = types . GenerateContentConfig ( response_modalities = [ 'TEXT' , 'IMAGE' ] ) ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) elif part . inline_data is not None : image = Image . open ( BytesIO (( part . inline_data . data ))) image . show () JavaScript Note: We've released the Google SDK for TypeScript and JavaScript in preview launch stage . Use this SDK for image generation features. import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); // Load the image from the local file system const imagePath = "path/to/image.png" ; const imageData = fs . readFileSync ( imagePath ); const base64Image = imageData . toString ( "base64" ); // Prepare the content parts const contents = [ { text : "Can you add a llama next to the image?" }, { inlineData : { mimeType : "image/png" , data : base64Image , }, }, ]; // Set responseModalities to include "Image" so the model can generate an image const response = await ai . models . generateContent ({ model : "gemini-2.0-flash-preview-image-generation" , contents : contents , config : { responseModalities : [ Modality . TEXT , Modality . IMAGE ], }, }); for ( const part of response . candidates [ 0 ]. content . parts ) { // Based on the part type, either show the text or save the image if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "gemini-native-image.png" , buffer ); console . log ( "Image saved as gemini-native-image.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imagePath := "/path/to/image.png" imgData , _ := os . ReadFile ( imagePath ) parts := [] * genai . Part { genai . NewPartFromText ( "Hi, This is a picture of me. Can you add a llama next to me?" ), & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData , }, }, } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } config := & genai . GenerateContentConfig { ResponseModalities : [] string { "TEXT" , "IMAGE" }, } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.0-flash-preview-image-generation" , contents , config , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "gemini_generated_image.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH = /path/to/your/image1.jpeg if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ {\"text\": \"'Hi, This is a picture of me. Can you add a llama next to me\"}, { \"inline_data\": { \"mime_type\":\"image/jpeg\", \"data\": \" $IMG_BASE64 \" } } ] }], \"generationConfig\": {\"responseModalities\": [\"TEXT\", \"IMAGE\"]} }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > gemini-edited-image.png Other image generation modes Gemini supports other image interaction modes based on prompt structure and context, including: Text to image(s) and text (interleaved): Outputs images with related text. Example prompt: "Generate an illustrated recipe for a paella." Image(s) and text to image(s) and text (interleaved) : Uses input images and text to create new related images and text. Example prompt: (With an image of a furnished room) "What other color sofas would work in my space? can you update the image?" Multi-turn image editing (chat): Keep generating / editing images conversationally. Example prompts: [upload an image of a blue car.] , "Turn this car into a convertible.", "Now change the color to yellow." Limitations For best performance, use the following languages: EN, es-MX, ja-JP, zh-CN, hi-IN. Image generation does not support audio or video inputs. Image generation may not always trigger: The model may output text only. Try asking for image outputs explicitly (e.g. "generate an image", "provide images as you go along", "update the image"). The model may stop generating partway through. Try again or try a different prompt. When generating text for an image, Gemini works best if you first generate the text and then ask for an image with the text. There are some regions/countries where Image generation is not available. See Models for more information. Generate images using the Imagen models This example demonstrates generating images with an Imagen model : Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () response = client . models . generate_images ( model = 'imagen-4.0-generate-preview-06-06' , prompt = 'Robot holding a red skateboard' , config = types . GenerateImagesConfig ( number_of_images = 4 , ) ) for generated_image in response . generated_images : generated_image . image . show () JavaScript import { GoogleGenAI } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const response = await ai . models . generateImages ({ model : 'imagen-4.0-generate-preview-06-06' , prompt : 'Robot holding a red skateboard' , config : { numberOfImages : 4 , }, }); let idx = 1 ; for ( const generatedImage of response . generatedImages ) { let imgBytes = generatedImage . image . imageBytes ; const buffer = Buffer . from ( imgBytes , "base64" ); fs . writeFileSync ( `imagen- ${ idx } .png` , buffer ); idx ++ ; } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } config := & genai . GenerateImagesConfig { NumberOfImages : 4 , } response , _ := client . Models . GenerateImages ( ctx , "imagen-4.0-generate-preview-06-06" , "Robot holding a red skateboard" , config , ) for n , image := range response . GeneratedImages { fname := fmt . Sprintf ( "imagen-%d.png" , n ) _ = os . WriteFile ( fname , image . Image . ImageBytes , 0644 ) } } REST curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/imagen-4.0-generate-preview-06-06:predict" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "instances": [ { "prompt": "Robot holding a red skateboard" } ], "parameters": { "sampleCount": 4 } }' AI-generated image of a robot holding a red skateboard Imagen configuration Imagen supports English only prompts at this time and the following parameters: Note: Naming conventions of parameters vary by programming language. numberOfImages : The number of images to generate, from 1 to 4 (inclusive). The default is 4. For Imagen 4 Ultra, it defaults to 1 as only one image can be generated at a time. aspectRatio : Changes the aspect ratio of the generated image. Supported values are "1:1" , "3:4" , "4:3" , "9:16" , and "16:9" . The default is "1:1" . personGeneration : Allow the model to generate images of people. The following values are supported: "dont_allow" : Block generation of images of people. "allow_adult" : Generate images of adults, but not children. This is the default. "allow_all" : Generate images that include adults and children. Note: The "allow_all" parameter value is not allowed in EU, UK, CH, MENA locations. Choosing the right model Choose Gemini when: You need contextually relevant images that leverage world knowledge and reasoning. Seamlessly blending text and images is important. You want accurate visuals embedded within long text sequences. You want to edit images conversationally while maintaining context. Choose Imagen when: Image quality, photorealism, artistic detail, or specific styles (e.g., impressionism, anime) are top priorities. Performing specialized editing tasks like product background updates or image upscaling. Infusing branding, style, or generating logos and product designs. Imagen 4 should be your go-to model starting to generate images with Imagen. Choose Imagen 4 Ultra for advanced use-cases or when you need the best image quality. Note that Imagen 4 Ultra can only generate one image at a time. Imagen prompt guide This section of the Imagen guide shows you how modifying a text-to-image prompt can produce different results, along with examples of images you can create. Prompt writing basics Note: Maximum prompt length is 480 tokens. A good prompt is descriptive and clear, and makes use of meaningful keywords and modifiers. Start by thinking of your subject , context , and style . Image text: A sketch ( style ) of a modern apartment building ( subject ) surrounded by skyscrapers ( context and background ). Subject : The first thing to think about with any prompt is the subject : the object, person, animal, or scenery you want an image of. Context and background: Just as important is the background or context in which the subject will be placed. Try placing your subject in a variety of backgrounds. For example, a studio with a white background, outdoors, or indoor environments. Style: Finally, add the style of image you want. Styles can be general (painting, photograph, sketches) or very specific (pastel painting, charcoal drawing, isometric 3D). You can also combine styles. After you write a first version of your prompt, refine your prompt by adding more details until you get to the image that you want. Iteration is important. Start by establishing your core idea, and then refine and expand upon that core idea until the generated image is close to your vision. Prompt: A park in the spring next to a lake Prompt: A park in the spring next to a lake, the sun sets across the lake, golden hour Prompt: A park in the spring next to a lake, the sun sets across the lake, golden hour, red wildflowers Imagen models can transform your ideas into detailed images, whether your prompts are short or long and detailed. Refine your vision through iterative prompting, adding details until you achieve the perfect result. Short prompts let you generate an image quickly. Prompt: close-up photo of a woman in her 20s, street photography, movie still, muted orange warm tones Longer prompts let you add specific details and build your image. Prompt: captivating photo of a woman in her 20s utilizing a street photography style. The image should look like a movie still with muted orange warm tones. Additional advice for Imagen prompt writing: Use descriptive language : Employ detailed adjectives and adverbs to paint a clear picture for Imagen. Provide context : If necessary, include background information to aid the AI's understanding. Reference specific artists or styles : If you have a particular aesthetic in mind, referencing specific artists or art movements can be helpful. Use prompt engineering tools : Consider exploring prompt engineering tools or resources to help you refine your prompts and achieve optimal results. Enhancing the facial details in your personal and group images : Specify facial details as a focus of the photo (for example, use the word "portrait" in the prompt). Generate text in images Imagen models can add text into images, opening up more creative image generation possibilities. Use the following guidance to get the most out of this feature: Iterate with confidence : You might have to regenerate images until you achieve the look you want. Imagen's text integration is still evolving, and sometimes multiple attempts yield the best results. Keep it short : Limit text to 25 characters or less for optimal generation. Multiple phrases : Experiment with two or three distinct phrases to provide additional information. Avoid exceeding three phrases for cleaner compositions. Prompt: A poster with the text "Summerland" in bold font as a title, underneath this text is the slogan "Summer never felt so good" Guide Placement : While Imagen can attempt to position text as directed, expect occasional variations. This feature is continually improving. Inspire font style : Specify a general font style to subtly influence Imagen's choices. Don't rely on precise font replication, but expect creative interpretations. Font size : Specify a font size or a general indication of size (for example, small , medium , large ) to influence the font size generation. Prompt parameterization To better control output results, you might find it helpful to parameterize the inputs into Imagen. For example, suppose you want your customers to be able to generate logos for their business, and you want to make sure logos are always generated on a solid color background. You also want to limit the options that the client can select from a menu. In this example, you can create a parameterized prompt similar to the following: A {logo_style} logo for a {company_area} company on a solid color background. Include the text {company_name} . In your custom user interface, the customer can input the parameters using a menu, and their chosen value populates the prompt Imagen receives. For example: Prompt: A minimalist logo for a health care company on a solid color background. Include the text Journey . Prompt: A modern logo for a software company on a solid color background. Include the text Silo . Prompt: A traditional logo for a baking company on a solid color background. Include the text Seed . Advanced prompt writing techniques Use the following examples to create more specific prompts based on attributes like photography descriptors, shapes and materials, historical art movements, and image quality modifiers. Photography Prompt includes: "A photo of..." To use this style, start with using keywords that clearly tell Imagen that you're looking for a photograph. Start your prompts with "A photo of. . ." . For example: Prompt: A photo of coffee beans in a kitchen on a wooden surface Prompt: A photo of a chocolate bar on a kitchen counter Prompt: A photo of a modern building with water in the background Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Photography modifiers In the following examples, you can see several photography-specific modifiers and parameters. You can combine multiple modifiers for more precise control. Camera Proximity - Close up, taken from far away Prompt: A close-up photo of coffee beans Prompt: A zoomed out photo of a small bag of coffee beans in a messy kitchen Camera Position - aerial, from below Prompt: aerial photo of urban city with skyscrapers Prompt: A photo of a forest canopy with blue skies from below Lighting - natural, dramatic, warm, cold Prompt: studio photo of a modern arm chair, natural lighting Prompt: studio photo of a modern arm chair, dramatic lighting Camera Settings - motion blur, soft focus, bokeh, portrait Prompt: photo of a city with skyscrapers from the inside of a car with motion blur Prompt: soft focus photograph of a bridge in an urban city at night Lens types - 35mm, 50mm, fisheye, wide angle, macro Prompt: photo of a leaf, macro lens Prompt: street photography, new york city, fisheye lens Film types - black and white, polaroid Prompt: a polaroid portrait of a dog wearing sunglasses Prompt: black and white photo of a dog wearing sunglasses Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Illustration and art Prompt includes: "A painting of..." , "A sketch of..." Art styles vary from monochrome styles like pencil sketches, to hyper-realistic digital art. For example, the following images use the same prompt with different styles: "An [art style or creation technique] of an angular sporty electric sedan with skyscrapers in the background" Prompt: A technical pencil drawing of an angular... Prompt: A charcoal drawing of an angular... Prompt: A color pencil drawing of an angular... Prompt: A pastel painting of an angular... Prompt: A digital art of an angular... Prompt: An art deco (poster) of an angular... Image source: Each image was generated using its corresponding text prompt with the Imagen 2 model. Shapes and materials Prompt includes: "...made of..." , "...in the shape of..." One of the strengths of this technology is that you can create imagery that is otherwise difficult or impossible. For example, you can recreate your company logo in different materials and textures. Prompt: a duffle bag made of cheese Prompt: neon tubes in the shape of a bird Prompt: an armchair made of paper , studio photo, origami style Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Historical art references Prompt includes: "...in the style of..." Certain styles have become iconic over the years. The following are some ideas of historical painting or art styles that you can try. "generate an image in the style of [art period or movement] : a wind farm" Prompt: generate an image in the style of an impressionist painting : a wind farm Prompt: generate an image in the style of a renaissance painting : a wind farm Prompt: generate an image in the style of pop art : a wind farm Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Image quality modifiers Certain keywords can let the model know that you're looking for a high-quality asset. Examples of quality modifiers include the following: General Modifiers - high-quality, beautiful, stylized Photos - 4K, HDR, Studio Photo Art, Illustration - by a professional, detailed The following are a few examples of prompts without quality modifiers and the same prompt with quality modifiers. Prompt (no quality modifiers): a photo of a corn stalk Prompt (with quality modifiers): 4k HDR beautiful photo of a corn stalk taken by a professional photographer Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Aspect ratios Imagen image generation lets you set five distinct image aspect ratios. Square (1:1, default) - A standard square photo. Common uses for this aspect ratio include social media posts. Fullscreen (4:3) - This aspect ratio is commonly used in media or film. It is also the dimensions of most old (non-widescreen) TVs and medium format cameras. It captures more of the scene horizontally (compared to 1:1), making it a preferred aspect ratio for photography. Prompt: close up of a musician's fingers playing the piano, black and white film, vintage (4:3 aspect ratio) Prompt: A professional studio photo of french fries for a high end restaurant, in the style of a food magazine (4:3 aspect ratio) Portrait full screen (3:4) - This is the fullscreen aspect ratio rotated 90 degrees. This lets to capture more of the scene vertically compared to the 1:1 aspect ratio. Prompt: a woman hiking, close of her boots reflected in a puddle, large mountains in the background, in the style of an advertisement, dramatic angles (3:4 aspect ratio) Prompt: aerial shot of a river flowing up a mystical valley (3:4 aspect ratio) Widescreen (16:9) - This ratio has replaced 4:3 and is now the most common aspect ratio for TVs, monitors, and mobile phone screens (landscape). Use this aspect ratio when you want to capture more of the background (for example, scenic landscapes). Prompt: a man wearing all white clothing sitting on the beach, close up, golden hour lighting (16:9 aspect ratio) Portrait (9:16) - This ratio is widescreen but rotated. This a relatively new aspect ratio that has been popularized by short form video apps (for example, YouTube shorts). Use this for tall objects with strong vertical orientations such as buildings, trees, waterfalls, or other similar objects. Prompt: a digital render of a massive skyscraper, modern, grand, epic with a beautiful sunset in the background (9:16 aspect ratio) Photorealistic images Different versions of the image generation model might offer a mix of artistic and photorealistic output. Use the following wording in prompts to generate more photorealistic output, based on the subject you want to generate. Note: Take these keywords as general guidance when you try to create photorealistic images. They aren't required to achieve your goal. Use case Lens type Focal lengths Additional details People (portraits) Prime, zoom 24-35mm black and white film, Film noir, Depth of field, duotone (mention two colors) Food, insects, plants (objects, still life) Macro 60-105mm High detail, precise focusing, controlled lighting Sports, wildlife (motion) Telephoto zoom 100-400mm Fast shutter speed, Action or movement tracking Astronomical, landscape (wide-angle) Wide-angle 10-24mm Long exposure times, sharp focus, long exposure, smooth water or clouds Portraits Use case Lens type Focal lengths Additional details People (portraits) Prime, zoom 24-35mm black and white film, Film noir, Depth of field, duotone (mention two colors) Using several keywords from the table, Imagen can generate the following portraits: Prompt: A woman, 35mm portrait, blue and grey duotones Model: imagen-3.0-generate-002 Prompt: A woman, 35mm portrait, film noir Model: imagen-3.0-generate-002 Objects Use case Lens type Focal lengths Additional details Food, insects, plants (objects, still life) Macro 60-105mm High detail, precise focusing, controlled lighting Using several keywords from the table, Imagen can generate the following object images: Prompt: leaf of a prayer plant, macro lens, 60mm Model: imagen-3.0-generate-002 Prompt: a plate of pasta, 100mm Macro lens Model: imagen-3.0-generate-002 Motion Use case Lens type Focal lengths Additional details Sports, wildlife (motion) Telephoto zoom 100-400mm Fast shutter speed, Action or movement tracking Using several keywords from the table, Imagen can generate the following motion images: Prompt: a winning touchdown, fast shutter speed, movement tracking Model: imagen-3.0-generate-002 Prompt: A deer running in the forest, fast shutter speed, movement tracking Model: imagen-3.0-generate-002 Wide-angle Use case Lens type Focal lengths Additional details Astronomical, landscape (wide-angle) Wide-angle 10-24mm Long exposure times, sharp focus, long exposure, smooth water or clouds Using several keywords from the table, Imagen can generate the following wide-angle images: Prompt: an expansive mountain range, landscape wide angle 10mm Model: imagen-3.0-generate-002 Prompt: a photo of the moon, astro photography, wide angle 10mm Model: imagen-3.0-generate-002 What's next Check out the Veo guide to learn how to generate videos with the Gemini API. To learn more about Gemini models, see Gemini models and Experimental models . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
|
text_content/docs_image-generation_74d7b467.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/image-generation#gemini
|
| 2 |
+
Title: Image generation | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Image generation | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Image generation You can generate images using the Gemini API with either Gemini's built-in multimodal capabilities or Imagen, Google's specialized image generation models. For most use cases, start with Gemini . Choose Imagen for specialized tasks where image quality is critical. See Choosing the right model section for more guidance. All generated images include a SynthID watermark . Before you begin Ensure you use a supported model and version for image generation: For Gemini , use Gemini 2.0 Flash Preview Image Generation. For Imagen , use one of the Imagen models (Imagen 3, Imagen 4 or Imagen 4 Ultra). Note that those models are only available on the Paid tier . You can access both Gemini and Imagen models using the same libraries. Note: Image generation may not be available in all regions and countries, review our Models page for more information. Generate images using Gemini Gemini can generate and process images conversationally. You can prompt Gemini with text, images, or a combination of both to achieve various image-related tasks, such as image generation and editing. You must include responseModalities : ["TEXT", "IMAGE"] in your configuration. Image-only output is not supported with these models. Image generation (text-to-image) The following code demonstrates how to generate an image based on a descriptive prompt: Python from google import genai from google.genai import types from PIL import Image from io import BytesIO import base64 client = genai . Client () contents = ( 'Hi, can you create a 3d rendered image of a pig ' 'with wings and a top hat flying over a happy ' 'futuristic scifi city with lots of greenery?' ) response = client . models . generate_content ( model = "gemini-2.0-flash-preview-image-generation" , contents = contents , config = types . GenerateContentConfig ( response_modalities = [ 'TEXT' , 'IMAGE' ] ) ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) elif part . inline_data is not None : image = Image . open ( BytesIO (( part . inline_data . data ))) image . save ( 'gemini-native-image.png' ) image . show () JavaScript Note: We've released the Google SDK for TypeScript and JavaScript in preview launch stage . Use this SDK for image generation features. import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const contents = "Hi, can you create a 3d rendered image of a pig " + "with wings and a top hat flying over a happy " + "futuristic scifi city with lots of greenery?" ; // Set responseModalities to include "Image" so the model can generate an image const response = await ai . models . generateContent ({ model : "gemini-2.0-flash-preview-image-generation" , contents : contents , config : { responseModalities : [ Modality . TEXT , Modality . IMAGE ], }, }); for ( const part of response . candidates [ 0 ]. content . parts ) { // Based on the part type, either show the text or save the image if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "gemini-native-image.png" , buffer ); console . log ( "Image saved as gemini-native-image.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } config := & genai . GenerateContentConfig { ResponseModalities : [] string { "TEXT" , "IMAGE" }, } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.0-flash-preview-image-generation" , genai . Text ( "Hi, can you create a 3d rendered image of a pig " + "with wings and a top hat flying over a happy " + "futuristic scifi city with lots of greenery?" ), config , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "gemini_generated_image.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "Hi, can you create a 3d rendered image of a pig with wings and a top hat flying over a happy futuristic scifi city with lots of greenery?"} ] }], "generationConfig":{"responseModalities":["TEXT","IMAGE"]} }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > gemini-native-image.png AI-generated image of a fantastical flying pig Image editing (text-and-image-to-image) To perform image editing, add an image as input. The following example demonstrates uploading base64 encoded images. For multiple images and larger payloads, check the image input section. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO import PIL.Image image = PIL . Image . open ( '/path/to/image.png' ) client = genai . Client () text_input = ( 'Hi, This is a picture of me.' 'Can you add a llama next to me?' ,) response = client . models . generate_content ( model = "gemini-2.0-flash-preview-image-generation" , contents = [ text_input , image ], config = types . GenerateContentConfig ( response_modalities = [ 'TEXT' , 'IMAGE' ] ) ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) elif part . inline_data is not None : image = Image . open ( BytesIO (( part . inline_data . data ))) image . show () JavaScript Note: We've released the Google SDK for TypeScript and JavaScript in preview launch stage . Use this SDK for image generation features. import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); // Load the image from the local file system const imagePath = "path/to/image.png" ; const imageData = fs . readFileSync ( imagePath ); const base64Image = imageData . toString ( "base64" ); // Prepare the content parts const contents = [ { text : "Can you add a llama next to the image?" }, { inlineData : { mimeType : "image/png" , data : base64Image , }, }, ]; // Set responseModalities to include "Image" so the model can generate an image const response = await ai . models . generateContent ({ model : "gemini-2.0-flash-preview-image-generation" , contents : contents , config : { responseModalities : [ Modality . TEXT , Modality . IMAGE ], }, }); for ( const part of response . candidates [ 0 ]. content . parts ) { // Based on the part type, either show the text or save the image if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "gemini-native-image.png" , buffer ); console . log ( "Image saved as gemini-native-image.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imagePath := "/path/to/image.png" imgData , _ := os . ReadFile ( imagePath ) parts := [] * genai . Part { genai . NewPartFromText ( "Hi, This is a picture of me. Can you add a llama next to me?" ), & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData , }, }, } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } config := & genai . GenerateContentConfig { ResponseModalities : [] string { "TEXT" , "IMAGE" }, } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.0-flash-preview-image-generation" , contents , config , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "gemini_generated_image.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH = /path/to/your/image1.jpeg if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ {\"text\": \"'Hi, This is a picture of me. Can you add a llama next to me\"}, { \"inline_data\": { \"mime_type\":\"image/jpeg\", \"data\": \" $IMG_BASE64 \" } } ] }], \"generationConfig\": {\"responseModalities\": [\"TEXT\", \"IMAGE\"]} }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > gemini-edited-image.png Other image generation modes Gemini supports other image interaction modes based on prompt structure and context, including: Text to image(s) and text (interleaved): Outputs images with related text. Example prompt: "Generate an illustrated recipe for a paella." Image(s) and text to image(s) and text (interleaved) : Uses input images and text to create new related images and text. Example prompt: (With an image of a furnished room) "What other color sofas would work in my space? can you update the image?" Multi-turn image editing (chat): Keep generating / editing images conversationally. Example prompts: [upload an image of a blue car.] , "Turn this car into a convertible.", "Now change the color to yellow." Limitations For best performance, use the following languages: EN, es-MX, ja-JP, zh-CN, hi-IN. Image generation does not support audio or video inputs. Image generation may not always trigger: The model may output text only. Try asking for image outputs explicitly (e.g. "generate an image", "provide images as you go along", "update the image"). The model may stop generating partway through. Try again or try a different prompt. When generating text for an image, Gemini works best if you first generate the text and then ask for an image with the text. There are some regions/countries where Image generation is not available. See Models for more information. Generate images using the Imagen models This example demonstrates generating images with an Imagen model : Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () response = client . models . generate_images ( model = 'imagen-4.0-generate-preview-06-06' , prompt = 'Robot holding a red skateboard' , config = types . GenerateImagesConfig ( number_of_images = 4 , ) ) for generated_image in response . generated_images : generated_image . image . show () JavaScript import { GoogleGenAI } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const response = await ai . models . generateImages ({ model : 'imagen-4.0-generate-preview-06-06' , prompt : 'Robot holding a red skateboard' , config : { numberOfImages : 4 , }, }); let idx = 1 ; for ( const generatedImage of response . generatedImages ) { let imgBytes = generatedImage . image . imageBytes ; const buffer = Buffer . from ( imgBytes , "base64" ); fs . writeFileSync ( `imagen- ${ idx } .png` , buffer ); idx ++ ; } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } config := & genai . GenerateImagesConfig { NumberOfImages : 4 , } response , _ := client . Models . GenerateImages ( ctx , "imagen-4.0-generate-preview-06-06" , "Robot holding a red skateboard" , config , ) for n , image := range response . GeneratedImages { fname := fmt . Sprintf ( "imagen-%d.png" , n ) _ = os . WriteFile ( fname , image . Image . ImageBytes , 0644 ) } } REST curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/imagen-4.0-generate-preview-06-06:predict" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "instances": [ { "prompt": "Robot holding a red skateboard" } ], "parameters": { "sampleCount": 4 } }' AI-generated image of a robot holding a red skateboard Imagen configuration Imagen supports English only prompts at this time and the following parameters: Note: Naming conventions of parameters vary by programming language. numberOfImages : The number of images to generate, from 1 to 4 (inclusive). The default is 4. For Imagen 4 Ultra, it defaults to 1 as only one image can be generated at a time. aspectRatio : Changes the aspect ratio of the generated image. Supported values are "1:1" , "3:4" , "4:3" , "9:16" , and "16:9" . The default is "1:1" . personGeneration : Allow the model to generate images of people. The following values are supported: "dont_allow" : Block generation of images of people. "allow_adult" : Generate images of adults, but not children. This is the default. "allow_all" : Generate images that include adults and children. Note: The "allow_all" parameter value is not allowed in EU, UK, CH, MENA locations. Choosing the right model Choose Gemini when: You need contextually relevant images that leverage world knowledge and reasoning. Seamlessly blending text and images is important. You want accurate visuals embedded within long text sequences. You want to edit images conversationally while maintaining context. Choose Imagen when: Image quality, photorealism, artistic detail, or specific styles (e.g., impressionism, anime) are top priorities. Performing specialized editing tasks like product background updates or image upscaling. Infusing branding, style, or generating logos and product designs. Imagen 4 should be your go-to model starting to generate images with Imagen. Choose Imagen 4 Ultra for advanced use-cases or when you need the best image quality. Note that Imagen 4 Ultra can only generate one image at a time. Imagen prompt guide This section of the Imagen guide shows you how modifying a text-to-image prompt can produce different results, along with examples of images you can create. Prompt writing basics Note: Maximum prompt length is 480 tokens. A good prompt is descriptive and clear, and makes use of meaningful keywords and modifiers. Start by thinking of your subject , context , and style . Image text: A sketch ( style ) of a modern apartment building ( subject ) surrounded by skyscrapers ( context and background ). Subject : The first thing to think about with any prompt is the subject : the object, person, animal, or scenery you want an image of. Context and background: Just as important is the background or context in which the subject will be placed. Try placing your subject in a variety of backgrounds. For example, a studio with a white background, outdoors, or indoor environments. Style: Finally, add the style of image you want. Styles can be general (painting, photograph, sketches) or very specific (pastel painting, charcoal drawing, isometric 3D). You can also combine styles. After you write a first version of your prompt, refine your prompt by adding more details until you get to the image that you want. Iteration is important. Start by establishing your core idea, and then refine and expand upon that core idea until the generated image is close to your vision. Prompt: A park in the spring next to a lake Prompt: A park in the spring next to a lake, the sun sets across the lake, golden hour Prompt: A park in the spring next to a lake, the sun sets across the lake, golden hour, red wildflowers Imagen models can transform your ideas into detailed images, whether your prompts are short or long and detailed. Refine your vision through iterative prompting, adding details until you achieve the perfect result. Short prompts let you generate an image quickly. Prompt: close-up photo of a woman in her 20s, street photography, movie still, muted orange warm tones Longer prompts let you add specific details and build your image. Prompt: captivating photo of a woman in her 20s utilizing a street photography style. The image should look like a movie still with muted orange warm tones. Additional advice for Imagen prompt writing: Use descriptive language : Employ detailed adjectives and adverbs to paint a clear picture for Imagen. Provide context : If necessary, include background information to aid the AI's understanding. Reference specific artists or styles : If you have a particular aesthetic in mind, referencing specific artists or art movements can be helpful. Use prompt engineering tools : Consider exploring prompt engineering tools or resources to help you refine your prompts and achieve optimal results. Enhancing the facial details in your personal and group images : Specify facial details as a focus of the photo (for example, use the word "portrait" in the prompt). Generate text in images Imagen models can add text into images, opening up more creative image generation possibilities. Use the following guidance to get the most out of this feature: Iterate with confidence : You might have to regenerate images until you achieve the look you want. Imagen's text integration is still evolving, and sometimes multiple attempts yield the best results. Keep it short : Limit text to 25 characters or less for optimal generation. Multiple phrases : Experiment with two or three distinct phrases to provide additional information. Avoid exceeding three phrases for cleaner compositions. Prompt: A poster with the text "Summerland" in bold font as a title, underneath this text is the slogan "Summer never felt so good" Guide Placement : While Imagen can attempt to position text as directed, expect occasional variations. This feature is continually improving. Inspire font style : Specify a general font style to subtly influence Imagen's choices. Don't rely on precise font replication, but expect creative interpretations. Font size : Specify a font size or a general indication of size (for example, small , medium , large ) to influence the font size generation. Prompt parameterization To better control output results, you might find it helpful to parameterize the inputs into Imagen. For example, suppose you want your customers to be able to generate logos for their business, and you want to make sure logos are always generated on a solid color background. You also want to limit the options that the client can select from a menu. In this example, you can create a parameterized prompt similar to the following: A {logo_style} logo for a {company_area} company on a solid color background. Include the text {company_name} . In your custom user interface, the customer can input the parameters using a menu, and their chosen value populates the prompt Imagen receives. For example: Prompt: A minimalist logo for a health care company on a solid color background. Include the text Journey . Prompt: A modern logo for a software company on a solid color background. Include the text Silo . Prompt: A traditional logo for a baking company on a solid color background. Include the text Seed . Advanced prompt writing techniques Use the following examples to create more specific prompts based on attributes like photography descriptors, shapes and materials, historical art movements, and image quality modifiers. Photography Prompt includes: "A photo of..." To use this style, start with using keywords that clearly tell Imagen that you're looking for a photograph. Start your prompts with "A photo of. . ." . For example: Prompt: A photo of coffee beans in a kitchen on a wooden surface Prompt: A photo of a chocolate bar on a kitchen counter Prompt: A photo of a modern building with water in the background Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Photography modifiers In the following examples, you can see several photography-specific modifiers and parameters. You can combine multiple modifiers for more precise control. Camera Proximity - Close up, taken from far away Prompt: A close-up photo of coffee beans Prompt: A zoomed out photo of a small bag of coffee beans in a messy kitchen Camera Position - aerial, from below Prompt: aerial photo of urban city with skyscrapers Prompt: A photo of a forest canopy with blue skies from below Lighting - natural, dramatic, warm, cold Prompt: studio photo of a modern arm chair, natural lighting Prompt: studio photo of a modern arm chair, dramatic lighting Camera Settings - motion blur, soft focus, bokeh, portrait Prompt: photo of a city with skyscrapers from the inside of a car with motion blur Prompt: soft focus photograph of a bridge in an urban city at night Lens types - 35mm, 50mm, fisheye, wide angle, macro Prompt: photo of a leaf, macro lens Prompt: street photography, new york city, fisheye lens Film types - black and white, polaroid Prompt: a polaroid portrait of a dog wearing sunglasses Prompt: black and white photo of a dog wearing sunglasses Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Illustration and art Prompt includes: "A painting of..." , "A sketch of..." Art styles vary from monochrome styles like pencil sketches, to hyper-realistic digital art. For example, the following images use the same prompt with different styles: "An [art style or creation technique] of an angular sporty electric sedan with skyscrapers in the background" Prompt: A technical pencil drawing of an angular... Prompt: A charcoal drawing of an angular... Prompt: A color pencil drawing of an angular... Prompt: A pastel painting of an angular... Prompt: A digital art of an angular... Prompt: An art deco (poster) of an angular... Image source: Each image was generated using its corresponding text prompt with the Imagen 2 model. Shapes and materials Prompt includes: "...made of..." , "...in the shape of..." One of the strengths of this technology is that you can create imagery that is otherwise difficult or impossible. For example, you can recreate your company logo in different materials and textures. Prompt: a duffle bag made of cheese Prompt: neon tubes in the shape of a bird Prompt: an armchair made of paper , studio photo, origami style Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Historical art references Prompt includes: "...in the style of..." Certain styles have become iconic over the years. The following are some ideas of historical painting or art styles that you can try. "generate an image in the style of [art period or movement] : a wind farm" Prompt: generate an image in the style of an impressionist painting : a wind farm Prompt: generate an image in the style of a renaissance painting : a wind farm Prompt: generate an image in the style of pop art : a wind farm Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Image quality modifiers Certain keywords can let the model know that you're looking for a high-quality asset. Examples of quality modifiers include the following: General Modifiers - high-quality, beautiful, stylized Photos - 4K, HDR, Studio Photo Art, Illustration - by a professional, detailed The following are a few examples of prompts without quality modifiers and the same prompt with quality modifiers. Prompt (no quality modifiers): a photo of a corn stalk Prompt (with quality modifiers): 4k HDR beautiful photo of a corn stalk taken by a professional photographer Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Aspect ratios Imagen image generation lets you set five distinct image aspect ratios. Square (1:1, default) - A standard square photo. Common uses for this aspect ratio include social media posts. Fullscreen (4:3) - This aspect ratio is commonly used in media or film. It is also the dimensions of most old (non-widescreen) TVs and medium format cameras. It captures more of the scene horizontally (compared to 1:1), making it a preferred aspect ratio for photography. Prompt: close up of a musician's fingers playing the piano, black and white film, vintage (4:3 aspect ratio) Prompt: A professional studio photo of french fries for a high end restaurant, in the style of a food magazine (4:3 aspect ratio) Portrait full screen (3:4) - This is the fullscreen aspect ratio rotated 90 degrees. This lets to capture more of the scene vertically compared to the 1:1 aspect ratio. Prompt: a woman hiking, close of her boots reflected in a puddle, large mountains in the background, in the style of an advertisement, dramatic angles (3:4 aspect ratio) Prompt: aerial shot of a river flowing up a mystical valley (3:4 aspect ratio) Widescreen (16:9) - This ratio has replaced 4:3 and is now the most common aspect ratio for TVs, monitors, and mobile phone screens (landscape). Use this aspect ratio when you want to capture more of the background (for example, scenic landscapes). Prompt: a man wearing all white clothing sitting on the beach, close up, golden hour lighting (16:9 aspect ratio) Portrait (9:16) - This ratio is widescreen but rotated. This a relatively new aspect ratio that has been popularized by short form video apps (for example, YouTube shorts). Use this for tall objects with strong vertical orientations such as buildings, trees, waterfalls, or other similar objects. Prompt: a digital render of a massive skyscraper, modern, grand, epic with a beautiful sunset in the background (9:16 aspect ratio) Photorealistic images Different versions of the image generation model might offer a mix of artistic and photorealistic output. Use the following wording in prompts to generate more photorealistic output, based on the subject you want to generate. Note: Take these keywords as general guidance when you try to create photorealistic images. They aren't required to achieve your goal. Use case Lens type Focal lengths Additional details People (portraits) Prime, zoom 24-35mm black and white film, Film noir, Depth of field, duotone (mention two colors) Food, insects, plants (objects, still life) Macro 60-105mm High detail, precise focusing, controlled lighting Sports, wildlife (motion) Telephoto zoom 100-400mm Fast shutter speed, Action or movement tracking Astronomical, landscape (wide-angle) Wide-angle 10-24mm Long exposure times, sharp focus, long exposure, smooth water or clouds Portraits Use case Lens type Focal lengths Additional details People (portraits) Prime, zoom 24-35mm black and white film, Film noir, Depth of field, duotone (mention two colors) Using several keywords from the table, Imagen can generate the following portraits: Prompt: A woman, 35mm portrait, blue and grey duotones Model: imagen-3.0-generate-002 Prompt: A woman, 35mm portrait, film noir Model: imagen-3.0-generate-002 Objects Use case Lens type Focal lengths Additional details Food, insects, plants (objects, still life) Macro 60-105mm High detail, precise focusing, controlled lighting Using several keywords from the table, Imagen can generate the following object images: Prompt: leaf of a prayer plant, macro lens, 60mm Model: imagen-3.0-generate-002 Prompt: a plate of pasta, 100mm Macro lens Model: imagen-3.0-generate-002 Motion Use case Lens type Focal lengths Additional details Sports, wildlife (motion) Telephoto zoom 100-400mm Fast shutter speed, Action or movement tracking Using several keywords from the table, Imagen can generate the following motion images: Prompt: a winning touchdown, fast shutter speed, movement tracking Model: imagen-3.0-generate-002 Prompt: A deer running in the forest, fast shutter speed, movement tracking Model: imagen-3.0-generate-002 Wide-angle Use case Lens type Focal lengths Additional details Astronomical, landscape (wide-angle) Wide-angle 10-24mm Long exposure times, sharp focus, long exposure, smooth water or clouds Using several keywords from the table, Imagen can generate the following wide-angle images: Prompt: an expansive mountain range, landscape wide angle 10mm Model: imagen-3.0-generate-002 Prompt: a photo of the moon, astro photography, wide angle 10mm Model: imagen-3.0-generate-002 What's next Check out the Veo guide to learn how to generate videos with the Gemini API. To learn more about Gemini models, see Gemini models and Experimental models . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
|
text_content/docs_image-understanding_1996580a.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/image-understanding#upload-image
|
| 2 |
+
Title: Image understanding | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Image understanding | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Image understanding Gemini models are built to be multimodal from the ground up, unlocking a wide range of image processing and computer vision tasks including but not limited to image captioning, classification, and visual question answering without having to train specialized ML models. Tip: In addition to their general multimodal capabilities, Gemini models (2.0 and newer) offer improved accuracy for specific use cases like object detection and segmentation , through additional training. See the Capabilities section for more details. Passing images to Gemini You can provide images as input to Gemini using two methods: Passing inline image data : Ideal for smaller files (total request size less than 20MB, including prompts). Uploading images using the File API : Recommended for larger files or for reusing images across multiple requests. Passing inline image data You can pass inline image data in the request to generateContent . You can provide image data as Base64 encoded strings or by reading local files directly (depending on the language). The following example shows how to read an image from a local file and pass it to generateContent API for processing. Python from google.genai import types with open ( 'path/to/small-sample.jpg' , 'rb' ) as f : image_bytes = f . read () response = client . models . generate_content ( model = 'gemini-2.5-flash' , contents = [ types . Part . from_bytes ( data = image_bytes , mime_type = 'image/jpeg' , ), 'Caption this image.' ] ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; import * as fs from "node:fs" ; const ai = new GoogleGenAI ({}); const base64ImageFile = fs . readFileSync ( "path/to/small-sample.jpg" , { encoding : "base64" , }); const contents = [ { inlineData : { mimeType : "image/jpeg" , data : base64ImageFile , }, }, { text : "Caption this image." }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : contents , }); console . log ( response . text ); Go bytes , _ := os . ReadFile ( "path/to/small-sample.jpg" ) parts := [] * genai . Part { genai . NewPartFromBytes ( bytes , "image/jpeg" ), genai . NewPartFromText ( "Caption this image." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) REST IMG_PATH = "/path/to/your/image1.jpg" if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ { "inline_data": { "mime_type":"image/jpeg", "data": "' " $( base64 $B64FLAGS $IMG_PATH ) " '" } }, {"text": "Caption this image."}, ] }] }' 2 > /dev/null You can also fetch an image from a URL, convert it to bytes, and pass it to generateContent as shown in the following examples. Python from google import genai from google.genai import types import requests image_path = "https://goo.gle/instrument-img" image_bytes = requests . get ( image_path ) . content image = types . Part . from_bytes ( data = image_bytes , mime_type = "image/jpeg" ) client = genai . Client () response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ "What is this image?" , image ], ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; async function main () { const ai = new GoogleGenAI ({}); const imageUrl = "https://goo.gle/instrument-img" ; const response = await fetch ( imageUrl ); const imageArrayBuffer = await response . arrayBuffer (); const base64ImageData = Buffer . from ( imageArrayBuffer ). toString ( 'base64' ); const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : [ { inlineData : { mimeType : 'image/jpeg' , data : base64ImageData , }, }, { text : "Caption this image." } ], }); console . log ( result . text ); } main (); Go package main import ( "context" "fmt" "os" "io" "net/http" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } // Download the image. imageResp , _ := http . Get ( "https://goo.gle/instrument-img" ) imageBytes , _ := io . ReadAll ( imageResp . Body ) parts := [] * genai . Part { genai . NewPartFromBytes ( imageBytes , "image/jpeg" ), genai . NewPartFromText ( "Caption this image." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST IMG_URL = "https://goo.gle/instrument-img" MIME_TYPE = $( curl -sIL " $IMG_URL " | grep -i '^content-type:' | awk -F ': ' '{print $2}' | sed 's/\r$//' | head -n 1 ) if [[ -z " $MIME_TYPE " || ! " $MIME_TYPE " == image/* ]] ; then MIME_TYPE = "image/jpeg" fi # Check for macOS if [[ " $( uname ) " == "Darwin" ]] ; then IMAGE_B64 = $( curl -sL " $IMG_URL " | base64 -b 0 ) elif [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then IMAGE_B64 = $( curl -sL " $IMG_URL " | base64 ) else IMAGE_B64 = $( curl -sL " $IMG_URL " | base64 -w0 ) fi curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ { "inline_data": { "mime_type":"' " $MIME_TYPE " '", "data": "' " $IMAGE_B64 " '" } }, {"text": "Caption this image."} ] }] }' 2 > /dev/null Note: Inline image data limits your total request size (text prompts, system instructions, and inline bytes) to 20MB. For larger requests, upload image files using the File API. Files API is also more efficient for scenarios that use the same image repeatedly. Uploading images using the File API For large files or to be able to use the same image file repeatedly, use the Files API. The following code uploads an image file and then uses the file in a call to generateContent . See the Files API guide for more information and examples. Python from google import genai client = genai . Client () my_file = client . files . upload ( file = "path/to/sample.jpg" ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ my_file , "Caption this image." ], ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); async function main () { const myfile = await ai . files . upload ({ file : "path/to/sample.jpg" , config : { mimeType : "image/jpeg" }, }); const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Caption this image." , ]), }); console . log ( response . text ); } await main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } uploadedFile , _ := client . Files . UploadFromPath ( ctx , "path/to/sample.jpg" , nil ) parts := [] * genai . Part { genai . NewPartFromText ( "Caption this image." ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST IMAGE_PATH = "path/to/sample.jpg" MIME_TYPE = $( file -b --mime-type " ${ IMAGE_PATH } " ) NUM_BYTES = $( wc -c < " ${ IMAGE_PATH } " ) DISPLAY_NAME = IMAGE tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D upload-header.tmp \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ IMAGE_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq -r ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"file_data":{"mime_type": "' " ${ MIME_TYPE } " '", "file_uri": "' " ${ file_uri } " '"}}, {"text": "Caption this image."}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Prompting with multiple images You can provide multiple images in a single prompt by including multiple image Part objects in the contents array. These can be a mix of inline data (local files or URLs) and File API references. Python from google import genai from google.genai import types client = genai . Client () # Upload the first image image1_path = "path/to/image1.jpg" uploaded_file = client . files . upload ( file = image1_path ) # Prepare the second image as inline data image2_path = "path/to/image2.png" with open ( image2_path , 'rb' ) as f : img2_bytes = f . read () # Create the prompt with text and multiple images response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ "What is different between these two images?" , uploaded_file , # Use the uploaded file reference types . Part . from_bytes ( data = img2_bytes , mime_type = 'image/png' ) ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; import * as fs from "node:fs" ; const ai = new GoogleGenAI ({}); async function main () { // Upload the first image const image1_path = "path/to/image1.jpg" ; const uploadedFile = await ai . files . upload ({ file : image1_path , config : { mimeType : "image/jpeg" }, }); // Prepare the second image as inline data const image2_path = "path/to/image2.png" ; const base64Image2File = fs . readFileSync ( image2_path , { encoding : "base64" , }); // Create the prompt with text and multiple images const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ "What is different between these two images?" , createPartFromUri ( uploadedFile . uri , uploadedFile . mimeType ), { inlineData : { mimeType : "image/png" , data : base64Image2File , }, }, ]), }); console . log ( response . text ); } await main (); Go // Upload the first image image1Path := "path/to/image1.jpg" uploadedFile , _ := client . Files . UploadFromPath ( ctx , image1Path , nil ) // Prepare the second image as inline data image2Path := "path/to/image2.jpeg" imgBytes , _ := os . ReadFile ( image2Path ) parts := [] * genai . Part { genai . NewPartFromText ( "What is different between these two images?" ), genai . NewPartFromBytes ( imgBytes , "image/jpeg" ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) REST # Upload the first image IMAGE1_PATH = "path/to/image1.jpg" MIME1_TYPE = $( file -b --mime-type " ${ IMAGE1_PATH } " ) NUM1_BYTES = $( wc -c < " ${ IMAGE1_PATH } " ) DISPLAY_NAME1 = IMAGE1 tmp_header_file1 = upload-header1.tmp curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D upload-header1.tmp \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM1_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME1_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME1 } '}}" 2 > /dev/null upload_url1 = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file1 } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file1 } " curl " ${ upload_url1 } " \ -H "Content-Length: ${ NUM1_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ IMAGE1_PATH } " 2 > /dev/null > file_info1.json file1_uri = $( jq ".file.uri" file_info1.json ) echo file1_uri = $file1_uri # Prepare the second image (inline) IMAGE2_PATH = "path/to/image2.png" MIME2_TYPE = $( file -b --mime-type " ${ IMAGE2_PATH } " ) if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMAGE2_BASE64 = $( base64 $B64FLAGS $IMAGE2_PATH ) # Now generate content using both images curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "What is different between these two images?"}, {"file_data":{"mime_type": "' " ${ MIME1_TYPE } " '", "file_uri": ' $file1_uri '}}, { "inline_data": { "mime_type":"' " ${ MIME2_TYPE } " '", "data": "' " $IMAGE2_BASE64 " '" } } ] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Object detection From Gemini 2.0 onwards, models are further trained to detect objects in an image and get their bounding box coordinates. The coordinates, relative to image dimensions, scale to [0, 1000]. You need to descale these coordinates based on your original image size. Python from google import genai from google.genai import types from PIL import Image import json client = genai . Client () prompt = "Detect the all of the prominent items in the image. The box_2d should be [ymin, xmin, ymax, xmax] normalized to 0-1000." image = Image . open ( "/path/to/image.png" ) config = types . GenerateContentConfig ( response_mime_type = "application/json" ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ image , prompt ], config = config ) width , height = image . size bounding_boxes = json . loads ( response . text ) converted_bounding_boxes = [] for bounding_box in bounding_boxes : abs_y1 = int ( bounding_box [ "box_2d" ][ 0 ] / 1000 * height ) abs_x1 = int ( bounding_box [ "box_2d" ][ 1 ] / 1000 * width ) abs_y2 = int ( bounding_box [ "box_2d" ][ 2 ] / 1000 * height ) abs_x2 = int ( bounding_box [ "box_2d" ][ 3 ] / 1000 * width ) converted_bounding_boxes . append ([ abs_x1 , abs_y1 , abs_x2 , abs_y2 ]) print ( "Image size: " , width , height ) print ( "Bounding boxes:" , converted_bounding_boxes ) Note: The model also supports generating bounding boxes based on custom instructions, such as: "Show bounding boxes of all green objects in this image". It also support custom labels like "label the items with the allergens they can contain". For more examples, check following notebooks in the Gemini Cookbook : 2D spatial understanding notebook Experimental 3D pointing notebook Segmentation Starting with Gemini 2.5, models not only detect items but also segment them and provide their contour masks. The model predicts a JSON list, where each item represents a segmentation mask. Each item has a bounding box (" box_2d ") in the format [y0, x0, y1, x1] with normalized coordinates between 0 and 1000, a label (" label ") that identifies the object, and finally the segmentation mask inside the bounding box, as base64 encoded png that is a probability map with values between 0 and 255. The mask needs to be resized to match the bounding box dimensions, then binarized at your confidence threshold (127 for the midpoint). Note: For better results, disable thinking by setting the thinking budget to 0. See code sample below for an example. Python from google import genai from google.genai import types from PIL import Image , ImageDraw import io import base64 import json import numpy as np import os client = genai . Client () def parse_json ( json_output : str ): # Parsing out the markdown fencing lines = json_output . splitlines () for i , line in enumerate ( lines ): if line == "```json" : json_output = " \n " . join ( lines [ i + 1 :]) # Remove everything before "```json" output = json_output . split ( "```" )[ 0 ] # Remove everything after the closing "```" break # Exit the loop once "```json" is found return json_output def extract_segmentation_masks ( image_path : str , output_dir : str = "segmentation_outputs" ): # Load and resize image im = Image . open ( image_path ) im . thumbnail ([ 1024 , 1024 ], Image . Resampling . LANCZOS ) prompt = """ Give the segmentation masks for the wooden and glass items. Output a JSON list of segmentation masks where each entry contains the 2D bounding box in the key "box_2d", the segmentation mask in key "mask", and the text label in the key "label". Use descriptive labels. """ config = types . GenerateContentConfig ( thinking_config = types . ThinkingConfig ( thinking_budget = 0 ) # set thinking_budget to 0 for better results in object detection ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ prompt , im ], # Pillow images can be directly passed as inputs (which will be converted by the SDK) config = config ) # Parse JSON response items = json . loads ( parse_json ( response . text )) # Create output directory os . makedirs ( output_dir , exist_ok = True ) # Process each mask for i , item in enumerate ( items ): # Get bounding box coordinates box = item [ "box_2d" ] y0 = int ( box [ 0 ] / 1000 * im . size [ 1 ]) x0 = int ( box [ 1 ] / 1000 * im . size [ 0 ]) y1 = int ( box [ 2 ] / 1000 * im . size [ 1 ]) x1 = int ( box [ 3 ] / 1000 * im . size [ 0 ]) # Skip invalid boxes if y0 > = y1 or x0 > = x1 : continue # Process mask png_str = item [ "mask" ] if not png_str . startswith ( "data:image/png;base64," ): continue # Remove prefix png_str = png_str . removeprefix ( "data:image/png;base64," ) mask_data = base64 . b64decode ( png_str ) mask = Image . open ( io . BytesIO ( mask_data )) # Resize mask to match bounding box mask = mask . resize (( x1 - x0 , y1 - y0 ), Image . Resampling . BILINEAR ) # Convert mask to numpy array for processing mask_array = np . array ( mask ) # Create overlay for this mask overlay = Image . new ( 'RGBA' , im . size , ( 0 , 0 , 0 , 0 )) overlay_draw = ImageDraw . Draw ( overlay ) # Create overlay for the mask color = ( 255 , 255 , 255 , 200 ) for y in range ( y0 , y1 ): for x in range ( x0 , x1 ): if mask_array [ y - y0 , x - x0 ] > 128 : # Threshold for mask overlay_draw . point (( x , y ), fill = color ) # Save individual mask and its overlay mask_filename = f " { item [ 'label' ] } _ { i } _mask.png" overlay_filename = f " { item [ 'label' ] } _ { i } _overlay.png" mask . save ( os . path . join ( output_dir , mask_filename )) # Create and save overlay composite = Image . alpha_composite ( im . convert ( 'RGBA' ), overlay ) composite . save ( os . path . join ( output_dir , overlay_filename )) print ( f "Saved mask and overlay for { item [ 'label' ] } to { output_dir } " ) # Example usage if __name__ == "__main__" : extract_segmentation_masks ( "path/to/image.png" ) Check the segmentation example in the cookbook guide for a more detailed example. An example segmentation output with objects and segmentation masks Supported image formats Gemini supports the following image format MIME types: PNG - image/png JPEG - image/jpeg WEBP - image/webp HEIC - image/heic HEIF - image/heif Capabilities All Gemini model versions are multimodal and can be utilized in a wide range of image processing and computer vision tasks including but not limited to image captioning, visual question and answering, image classification, object detection and segmentation. Gemini can reduce the need to use specialized ML models depending on your quality and performance requirements. Some later model versions are specifically trained improve accuracy of specialized tasks in addition to generic capabilities: Gemini 2.0 models are further trained to support enhanced object detection . Gemini 2.5 models are further trained to support enhanced segmentation in addition to object detection . Limitations and key technical information File limit Gemini 2.5 Pro/Flash, 2.0 Flash, 1.5 Pro, and 1.5 Flash support a maximum of 3,600 image files per request. Token calculation Gemini 1.5 Flash and Gemini 1.5 Pro : 258 tokens if both dimensions <= 384 pixels. Larger images are tiled (min tile 256px, max 768px, resized to 768x768), with each tile costing 258 tokens. Gemini 2.0 Flash and Gemini 2.5 Flash/Pro : 258 tokens if both dimensions <= 384 pixels. Larger images are tiled into 768x768 pixel tiles, each costing 258 tokens. Tips and best practices Verify that images are correctly rotated. Use clear, non-blurry images. When using a single image with text, place the text prompt after the image part in the contents array. What's next This guide shows you how to upload image files and generate text outputs from image inputs. To learn more, see the following resources: Files API : Learn more about uploading and managing files for use with Gemini. System instructions : System instructions let you steer the behavior of the model based on your specific needs and use cases. File prompting strategies : The Gemini API supports prompting with text, image, audio, and video data, also known as multimodal prompting. Safety guidance : Sometimes generative AI models produce unexpected outputs, such as outputs that are inaccurate, biased, or offensive. Post-processing and human evaluation are essential to limit the risk of harm from such outputs. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-08 UTC.
|
text_content/docs_image-understanding_26d88c6b.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/image-understanding#capabilities
|
| 2 |
+
Title: Image understanding | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Image understanding | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Image understanding Gemini models are built to be multimodal from the ground up, unlocking a wide range of image processing and computer vision tasks including but not limited to image captioning, classification, and visual question answering without having to train specialized ML models. Tip: In addition to their general multimodal capabilities, Gemini models (2.0 and newer) offer improved accuracy for specific use cases like object detection and segmentation , through additional training. See the Capabilities section for more details. Passing images to Gemini You can provide images as input to Gemini using two methods: Passing inline image data : Ideal for smaller files (total request size less than 20MB, including prompts). Uploading images using the File API : Recommended for larger files or for reusing images across multiple requests. Passing inline image data You can pass inline image data in the request to generateContent . You can provide image data as Base64 encoded strings or by reading local files directly (depending on the language). The following example shows how to read an image from a local file and pass it to generateContent API for processing. Python from google.genai import types with open ( 'path/to/small-sample.jpg' , 'rb' ) as f : image_bytes = f . read () response = client . models . generate_content ( model = 'gemini-2.5-flash' , contents = [ types . Part . from_bytes ( data = image_bytes , mime_type = 'image/jpeg' , ), 'Caption this image.' ] ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; import * as fs from "node:fs" ; const ai = new GoogleGenAI ({}); const base64ImageFile = fs . readFileSync ( "path/to/small-sample.jpg" , { encoding : "base64" , }); const contents = [ { inlineData : { mimeType : "image/jpeg" , data : base64ImageFile , }, }, { text : "Caption this image." }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : contents , }); console . log ( response . text ); Go bytes , _ := os . ReadFile ( "path/to/small-sample.jpg" ) parts := [] * genai . Part { genai . NewPartFromBytes ( bytes , "image/jpeg" ), genai . NewPartFromText ( "Caption this image." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) REST IMG_PATH = "/path/to/your/image1.jpg" if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ { "inline_data": { "mime_type":"image/jpeg", "data": "' " $( base64 $B64FLAGS $IMG_PATH ) " '" } }, {"text": "Caption this image."}, ] }] }' 2 > /dev/null You can also fetch an image from a URL, convert it to bytes, and pass it to generateContent as shown in the following examples. Python from google import genai from google.genai import types import requests image_path = "https://goo.gle/instrument-img" image_bytes = requests . get ( image_path ) . content image = types . Part . from_bytes ( data = image_bytes , mime_type = "image/jpeg" ) client = genai . Client () response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ "What is this image?" , image ], ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; async function main () { const ai = new GoogleGenAI ({}); const imageUrl = "https://goo.gle/instrument-img" ; const response = await fetch ( imageUrl ); const imageArrayBuffer = await response . arrayBuffer (); const base64ImageData = Buffer . from ( imageArrayBuffer ). toString ( 'base64' ); const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : [ { inlineData : { mimeType : 'image/jpeg' , data : base64ImageData , }, }, { text : "Caption this image." } ], }); console . log ( result . text ); } main (); Go package main import ( "context" "fmt" "os" "io" "net/http" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } // Download the image. imageResp , _ := http . Get ( "https://goo.gle/instrument-img" ) imageBytes , _ := io . ReadAll ( imageResp . Body ) parts := [] * genai . Part { genai . NewPartFromBytes ( imageBytes , "image/jpeg" ), genai . NewPartFromText ( "Caption this image." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST IMG_URL = "https://goo.gle/instrument-img" MIME_TYPE = $( curl -sIL " $IMG_URL " | grep -i '^content-type:' | awk -F ': ' '{print $2}' | sed 's/\r$//' | head -n 1 ) if [[ -z " $MIME_TYPE " || ! " $MIME_TYPE " == image/* ]] ; then MIME_TYPE = "image/jpeg" fi # Check for macOS if [[ " $( uname ) " == "Darwin" ]] ; then IMAGE_B64 = $( curl -sL " $IMG_URL " | base64 -b 0 ) elif [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then IMAGE_B64 = $( curl -sL " $IMG_URL " | base64 ) else IMAGE_B64 = $( curl -sL " $IMG_URL " | base64 -w0 ) fi curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ { "inline_data": { "mime_type":"' " $MIME_TYPE " '", "data": "' " $IMAGE_B64 " '" } }, {"text": "Caption this image."} ] }] }' 2 > /dev/null Note: Inline image data limits your total request size (text prompts, system instructions, and inline bytes) to 20MB. For larger requests, upload image files using the File API. Files API is also more efficient for scenarios that use the same image repeatedly. Uploading images using the File API For large files or to be able to use the same image file repeatedly, use the Files API. The following code uploads an image file and then uses the file in a call to generateContent . See the Files API guide for more information and examples. Python from google import genai client = genai . Client () my_file = client . files . upload ( file = "path/to/sample.jpg" ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ my_file , "Caption this image." ], ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); async function main () { const myfile = await ai . files . upload ({ file : "path/to/sample.jpg" , config : { mimeType : "image/jpeg" }, }); const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Caption this image." , ]), }); console . log ( response . text ); } await main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } uploadedFile , _ := client . Files . UploadFromPath ( ctx , "path/to/sample.jpg" , nil ) parts := [] * genai . Part { genai . NewPartFromText ( "Caption this image." ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST IMAGE_PATH = "path/to/sample.jpg" MIME_TYPE = $( file -b --mime-type " ${ IMAGE_PATH } " ) NUM_BYTES = $( wc -c < " ${ IMAGE_PATH } " ) DISPLAY_NAME = IMAGE tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D upload-header.tmp \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ IMAGE_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq -r ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"file_data":{"mime_type": "' " ${ MIME_TYPE } " '", "file_uri": "' " ${ file_uri } " '"}}, {"text": "Caption this image."}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Prompting with multiple images You can provide multiple images in a single prompt by including multiple image Part objects in the contents array. These can be a mix of inline data (local files or URLs) and File API references. Python from google import genai from google.genai import types client = genai . Client () # Upload the first image image1_path = "path/to/image1.jpg" uploaded_file = client . files . upload ( file = image1_path ) # Prepare the second image as inline data image2_path = "path/to/image2.png" with open ( image2_path , 'rb' ) as f : img2_bytes = f . read () # Create the prompt with text and multiple images response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ "What is different between these two images?" , uploaded_file , # Use the uploaded file reference types . Part . from_bytes ( data = img2_bytes , mime_type = 'image/png' ) ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; import * as fs from "node:fs" ; const ai = new GoogleGenAI ({}); async function main () { // Upload the first image const image1_path = "path/to/image1.jpg" ; const uploadedFile = await ai . files . upload ({ file : image1_path , config : { mimeType : "image/jpeg" }, }); // Prepare the second image as inline data const image2_path = "path/to/image2.png" ; const base64Image2File = fs . readFileSync ( image2_path , { encoding : "base64" , }); // Create the prompt with text and multiple images const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ "What is different between these two images?" , createPartFromUri ( uploadedFile . uri , uploadedFile . mimeType ), { inlineData : { mimeType : "image/png" , data : base64Image2File , }, }, ]), }); console . log ( response . text ); } await main (); Go // Upload the first image image1Path := "path/to/image1.jpg" uploadedFile , _ := client . Files . UploadFromPath ( ctx , image1Path , nil ) // Prepare the second image as inline data image2Path := "path/to/image2.jpeg" imgBytes , _ := os . ReadFile ( image2Path ) parts := [] * genai . Part { genai . NewPartFromText ( "What is different between these two images?" ), genai . NewPartFromBytes ( imgBytes , "image/jpeg" ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) REST # Upload the first image IMAGE1_PATH = "path/to/image1.jpg" MIME1_TYPE = $( file -b --mime-type " ${ IMAGE1_PATH } " ) NUM1_BYTES = $( wc -c < " ${ IMAGE1_PATH } " ) DISPLAY_NAME1 = IMAGE1 tmp_header_file1 = upload-header1.tmp curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D upload-header1.tmp \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM1_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME1_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME1 } '}}" 2 > /dev/null upload_url1 = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file1 } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file1 } " curl " ${ upload_url1 } " \ -H "Content-Length: ${ NUM1_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ IMAGE1_PATH } " 2 > /dev/null > file_info1.json file1_uri = $( jq ".file.uri" file_info1.json ) echo file1_uri = $file1_uri # Prepare the second image (inline) IMAGE2_PATH = "path/to/image2.png" MIME2_TYPE = $( file -b --mime-type " ${ IMAGE2_PATH } " ) if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMAGE2_BASE64 = $( base64 $B64FLAGS $IMAGE2_PATH ) # Now generate content using both images curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "What is different between these two images?"}, {"file_data":{"mime_type": "' " ${ MIME1_TYPE } " '", "file_uri": ' $file1_uri '}}, { "inline_data": { "mime_type":"' " ${ MIME2_TYPE } " '", "data": "' " $IMAGE2_BASE64 " '" } } ] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Object detection From Gemini 2.0 onwards, models are further trained to detect objects in an image and get their bounding box coordinates. The coordinates, relative to image dimensions, scale to [0, 1000]. You need to descale these coordinates based on your original image size. Python from google import genai from google.genai import types from PIL import Image import json client = genai . Client () prompt = "Detect the all of the prominent items in the image. The box_2d should be [ymin, xmin, ymax, xmax] normalized to 0-1000." image = Image . open ( "/path/to/image.png" ) config = types . GenerateContentConfig ( response_mime_type = "application/json" ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ image , prompt ], config = config ) width , height = image . size bounding_boxes = json . loads ( response . text ) converted_bounding_boxes = [] for bounding_box in bounding_boxes : abs_y1 = int ( bounding_box [ "box_2d" ][ 0 ] / 1000 * height ) abs_x1 = int ( bounding_box [ "box_2d" ][ 1 ] / 1000 * width ) abs_y2 = int ( bounding_box [ "box_2d" ][ 2 ] / 1000 * height ) abs_x2 = int ( bounding_box [ "box_2d" ][ 3 ] / 1000 * width ) converted_bounding_boxes . append ([ abs_x1 , abs_y1 , abs_x2 , abs_y2 ]) print ( "Image size: " , width , height ) print ( "Bounding boxes:" , converted_bounding_boxes ) Note: The model also supports generating bounding boxes based on custom instructions, such as: "Show bounding boxes of all green objects in this image". It also support custom labels like "label the items with the allergens they can contain". For more examples, check following notebooks in the Gemini Cookbook : 2D spatial understanding notebook Experimental 3D pointing notebook Segmentation Starting with Gemini 2.5, models not only detect items but also segment them and provide their contour masks. The model predicts a JSON list, where each item represents a segmentation mask. Each item has a bounding box (" box_2d ") in the format [y0, x0, y1, x1] with normalized coordinates between 0 and 1000, a label (" label ") that identifies the object, and finally the segmentation mask inside the bounding box, as base64 encoded png that is a probability map with values between 0 and 255. The mask needs to be resized to match the bounding box dimensions, then binarized at your confidence threshold (127 for the midpoint). Note: For better results, disable thinking by setting the thinking budget to 0. See code sample below for an example. Python from google import genai from google.genai import types from PIL import Image , ImageDraw import io import base64 import json import numpy as np import os client = genai . Client () def parse_json ( json_output : str ): # Parsing out the markdown fencing lines = json_output . splitlines () for i , line in enumerate ( lines ): if line == "```json" : json_output = " \n " . join ( lines [ i + 1 :]) # Remove everything before "```json" output = json_output . split ( "```" )[ 0 ] # Remove everything after the closing "```" break # Exit the loop once "```json" is found return json_output def extract_segmentation_masks ( image_path : str , output_dir : str = "segmentation_outputs" ): # Load and resize image im = Image . open ( image_path ) im . thumbnail ([ 1024 , 1024 ], Image . Resampling . LANCZOS ) prompt = """ Give the segmentation masks for the wooden and glass items. Output a JSON list of segmentation masks where each entry contains the 2D bounding box in the key "box_2d", the segmentation mask in key "mask", and the text label in the key "label". Use descriptive labels. """ config = types . GenerateContentConfig ( thinking_config = types . ThinkingConfig ( thinking_budget = 0 ) # set thinking_budget to 0 for better results in object detection ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ prompt , im ], # Pillow images can be directly passed as inputs (which will be converted by the SDK) config = config ) # Parse JSON response items = json . loads ( parse_json ( response . text )) # Create output directory os . makedirs ( output_dir , exist_ok = True ) # Process each mask for i , item in enumerate ( items ): # Get bounding box coordinates box = item [ "box_2d" ] y0 = int ( box [ 0 ] / 1000 * im . size [ 1 ]) x0 = int ( box [ 1 ] / 1000 * im . size [ 0 ]) y1 = int ( box [ 2 ] / 1000 * im . size [ 1 ]) x1 = int ( box [ 3 ] / 1000 * im . size [ 0 ]) # Skip invalid boxes if y0 > = y1 or x0 > = x1 : continue # Process mask png_str = item [ "mask" ] if not png_str . startswith ( "data:image/png;base64," ): continue # Remove prefix png_str = png_str . removeprefix ( "data:image/png;base64," ) mask_data = base64 . b64decode ( png_str ) mask = Image . open ( io . BytesIO ( mask_data )) # Resize mask to match bounding box mask = mask . resize (( x1 - x0 , y1 - y0 ), Image . Resampling . BILINEAR ) # Convert mask to numpy array for processing mask_array = np . array ( mask ) # Create overlay for this mask overlay = Image . new ( 'RGBA' , im . size , ( 0 , 0 , 0 , 0 )) overlay_draw = ImageDraw . Draw ( overlay ) # Create overlay for the mask color = ( 255 , 255 , 255 , 200 ) for y in range ( y0 , y1 ): for x in range ( x0 , x1 ): if mask_array [ y - y0 , x - x0 ] > 128 : # Threshold for mask overlay_draw . point (( x , y ), fill = color ) # Save individual mask and its overlay mask_filename = f " { item [ 'label' ] } _ { i } _mask.png" overlay_filename = f " { item [ 'label' ] } _ { i } _overlay.png" mask . save ( os . path . join ( output_dir , mask_filename )) # Create and save overlay composite = Image . alpha_composite ( im . convert ( 'RGBA' ), overlay ) composite . save ( os . path . join ( output_dir , overlay_filename )) print ( f "Saved mask and overlay for { item [ 'label' ] } to { output_dir } " ) # Example usage if __name__ == "__main__" : extract_segmentation_masks ( "path/to/image.png" ) Check the segmentation example in the cookbook guide for a more detailed example. An example segmentation output with objects and segmentation masks Supported image formats Gemini supports the following image format MIME types: PNG - image/png JPEG - image/jpeg WEBP - image/webp HEIC - image/heic HEIF - image/heif Capabilities All Gemini model versions are multimodal and can be utilized in a wide range of image processing and computer vision tasks including but not limited to image captioning, visual question and answering, image classification, object detection and segmentation. Gemini can reduce the need to use specialized ML models depending on your quality and performance requirements. Some later model versions are specifically trained improve accuracy of specialized tasks in addition to generic capabilities: Gemini 2.0 models are further trained to support enhanced object detection . Gemini 2.5 models are further trained to support enhanced segmentation in addition to object detection . Limitations and key technical information File limit Gemini 2.5 Pro/Flash, 2.0 Flash, 1.5 Pro, and 1.5 Flash support a maximum of 3,600 image files per request. Token calculation Gemini 1.5 Flash and Gemini 1.5 Pro : 258 tokens if both dimensions <= 384 pixels. Larger images are tiled (min tile 256px, max 768px, resized to 768x768), with each tile costing 258 tokens. Gemini 2.0 Flash and Gemini 2.5 Flash/Pro : 258 tokens if both dimensions <= 384 pixels. Larger images are tiled into 768x768 pixel tiles, each costing 258 tokens. Tips and best practices Verify that images are correctly rotated. Use clear, non-blurry images. When using a single image with text, place the text prompt after the image part in the contents array. What's next This guide shows you how to upload image files and generate text outputs from image inputs. To learn more, see the following resources: Files API : Learn more about uploading and managing files for use with Gemini. System instructions : System instructions let you steer the behavior of the model based on your specific needs and use cases. File prompting strategies : The Gemini API supports prompting with text, image, audio, and video data, also known as multimodal prompting. Safety guidance : Sometimes generative AI models produce unexpected outputs, such as outputs that are inaccurate, biased, or offensive. Post-processing and human evaluation are essential to limit the risk of harm from such outputs. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-08 UTC.
|
text_content/docs_live-guide_df92dcd6.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/live-guide#interruptions
|
| 2 |
+
Title: Live API capabilities guide | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Live API capabilities guide | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Live API capabilities guide Preview: The Live API is in preview. This is a comprehensive guide that covers capabilities and configurations available with the Live API. See Get started with Live API page for a overview and sample code for common use cases. Before you begin Familiarize yourself with core concepts: If you haven't already done so, read the Get started with Live API page first. This will introduce you to the fundamental principles of the Live API, how it works, and the distinction between the different models and their corresponding audio generation methods ( native audio or half-cascade). Try the Live API in AI Studio: You may find it useful to try the Live API in Google AI Studio before you start building. To use the Live API in Google AI Studio, select Stream . Establishing a connection The following example shows how to create a connection with an API key: Python import asyncio from google import genai client = genai . Client () model = "gemini-live-2.5-flash-preview" config = { "response_modalities" : [ "TEXT" ]} async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : print ( "Session started" ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const config = { responseModalities : [ Modality . TEXT ] }; async function main () { const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { console . debug ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); // Send content... session . close (); } main (); Note: You can only set one modality in the response_modalities field. This means that you can configure the model to respond with either text or audio, but not both in the same session. Interaction modalities The following sections provide examples and supporting context for the different input and output modalities available in Live API. Sending and receiving text Here's how you can send and receive text: Python import asyncio from google import genai client = genai . Client () model = "gemini-live-2.5-flash-preview" config = { "response_modalities" : [ "TEXT" ]} async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : message = "Hello, how are you?" await session . send_client_content ( turns = { "role" : "user" , "parts" : [{ "text" : message }]}, turn_complete = True ) async for response in session . receive (): if response . text is not None : print ( response . text , end = "" ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const config = { responseModalities : [ Modality . TEXT ] }; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'Hello how are you?' ; session . sendClientContent ({ turns : inputTurns }); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . text ) { console . debug ( 'Received text: %s\n' , turn . text ); } else if ( turn . data ) { console . debug ( 'Received inline data: %s\n' , turn . data ); } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); Incremental content updates Use incremental updates to send text input, establish session context, or restore session context. For short contexts you can send turn-by-turn interactions to represent the exact sequence of events: Python turns = [ { "role" : "user" , "parts" : [{ "text" : "What is the capital of France?" }]}, { "role" : "model" , "parts" : [{ "text" : "Paris" }]}, ] await session . send_client_content ( turns = turns , turn_complete = False ) turns = [{ "role" : "user" , "parts" : [{ "text" : "What is the capital of Germany?" }]}] await session . send_client_content ( turns = turns , turn_complete = True ) JavaScript let inputTurns = [ { "role" : "user" , "parts" : [{ "text" : "What is the capital of France?" }] }, { "role" : "model" , "parts" : [{ "text" : "Paris" }] }, ] session . sendClientContent ({ turns : inputTurns , turnComplete : false }) inputTurns = [{ "role" : "user" , "parts" : [{ "text" : "What is the capital of Germany?" }] }] session . sendClientContent ({ turns : inputTurns , turnComplete : true }) For longer contexts it's recommended to provide a single message summary to free up the context window for subsequent interactions. See Session Resumption for another method for loading session context. Sending and receiving audio The most common audio example, audio-to-audio , is covered in the Getting started guide. Here's an audio-to-text example that reads a WAV file, sends it in the correct format and receives text output: Python # Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav # Install helpers for converting files: pip install librosa soundfile import asyncio import io from pathlib import Path from google import genai from google.genai import types import soundfile as sf import librosa client = genai . Client () model = "gemini-live-2.5-flash-preview" config = { "response_modalities" : [ "TEXT" ]} async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : buffer = io . BytesIO () y , sr = librosa . load ( "sample.wav" , sr = 16000 ) sf . write ( buffer , y , sr , format = 'RAW' , subtype = 'PCM_16' ) buffer . seek ( 0 ) audio_bytes = buffer . read () # If already in correct format, you can use this: # audio_bytes = Path("sample.pcm").read_bytes() await session . send_realtime_input ( audio = types . Blob ( data = audio_bytes , mime_type = "audio/pcm;rate=16000" ) ) async for response in session . receive (): if response . text is not None : print ( response . text ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript // Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav // Install helpers for converting files: npm install wavefile import { GoogleGenAI , Modality } from '@google/genai' ; import * as fs from "node:fs" ; import pkg from 'wavefile' ; const { WaveFile } = pkg ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const config = { responseModalities : [ Modality . TEXT ] }; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); // Send Audio Chunk const fileBuffer = fs . readFileSync ( "sample.wav" ); // Ensure audio conforms to API requirements (16-bit PCM, 16kHz, mono) const wav = new WaveFile (); wav . fromBuffer ( fileBuffer ); wav . toSampleRate ( 16000 ); wav . toBitDepth ( "16" ); const base64Audio = wav . toBase64 (); // If already in correct format, you can use this: // const fileBuffer = fs.readFileSync("sample.pcm"); // const base64Audio = Buffer.from(fileBuffer).toString('base64'); session . sendRealtimeInput ( { audio : { data : base64Audio , mimeType : "audio/pcm;rate=16000" } } ); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . text ) { console . debug ( 'Received text: %s\n' , turn . text ); } else if ( turn . data ) { console . debug ( 'Received inline data: %s\n' , turn . data ); } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); And here is a text-to-audio example. You can receive audio by setting AUDIO as response modality. This example saves the received data as WAV file: Python import asyncio import wave from google import genai client = genai . Client () model = "gemini-live-2.5-flash-preview" config = { "response_modalities" : [ "AUDIO" ]} async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : wf = wave . open ( "audio.wav" , "wb" ) wf . setnchannels ( 1 ) wf . setsampwidth ( 2 ) wf . setframerate ( 24000 ) message = "Hello how are you?" await session . send_client_content ( turns = { "role" : "user" , "parts" : [{ "text" : message }]}, turn_complete = True ) async for response in session . receive (): if response . data is not None : wf . writeframes ( response . data ) # Un-comment this code to print audio data info # if response.server_content.model_turn is not None: # print(response.server_content.model_turn.parts[0].inline_data.mime_type) wf . close () if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; import * as fs from "node:fs" ; import pkg from 'wavefile' ; const { WaveFile } = pkg ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const config = { responseModalities : [ Modality . AUDIO ] }; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'Hello how are you?' ; session . sendClientContent ({ turns : inputTurns }); const turns = await handleTurn (); // Combine audio data strings and save as wave file const combinedAudio = turns . reduce (( acc , turn ) = > { if ( turn . data ) { const buffer = Buffer . from ( turn . data , 'base64' ); const intArray = new Int16Array ( buffer . buffer , buffer . byteOffset , buffer . byteLength / Int16Array . BYTES_PER_ELEMENT ); return acc . concat ( Array . from ( intArray )); } return acc ; }, []); const audioBuffer = new Int16Array ( combinedAudio ); const wf = new WaveFile (); wf . fromScratch ( 1 , 24000 , '16' , audioBuffer ); fs . writeFileSync ( 'output.wav' , wf . toBuffer ()); session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); Audio formats Audio data in the Live API is always raw, little-endian, 16-bit PCM. Audio output always uses a sample rate of 24kHz. Input audio is natively 16kHz, but the Live API will resample if needed so any sample rate can be sent. To convey the sample rate of input audio, set the MIME type of each audio-containing Blob to a value like audio/pcm;rate=16000 . Audio transcriptions You can enable transcription of the model's audio output by sending output_audio_transcription in the setup config. The transcription language is inferred from the model's response. Python import asyncio from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" config = { "response_modalities" : [ "AUDIO" ], "output_audio_transcription" : {} } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : message = "Hello? Gemini are you there?" await session . send_client_content ( turns = { "role" : "user" , "parts" : [{ "text" : message }]}, turn_complete = True ) async for response in session . receive (): if response . server_content . model_turn : print ( "Model turn:" , response . server_content . model_turn ) if response . server_content . output_transcription : print ( "Transcript:" , response . server_content . output_transcription . text ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const config = { responseModalities : [ Modality . AUDIO ], outputAudioTranscription : {} }; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'Hello how are you?' ; session . sendClientContent ({ turns : inputTurns }); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . outputTranscription ) { console . debug ( 'Received output transcription: %s\n' , turn . serverContent . outputTranscription . text ); } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); You can enable transcription of the audio input by sending input_audio_transcription in setup config. Python import asyncio from pathlib import Path from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" config = { "response_modalities" : [ "TEXT" ], "input_audio_transcription" : {}, } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : audio_data = Path ( "16000.pcm" ) . read_bytes () await session . send_realtime_input ( audio = types . Blob ( data = audio_data , mime_type = 'audio/pcm;rate=16000' ) ) async for msg in session . receive (): if msg . server_content . input_transcription : print ( 'Transcript:' , msg . server_content . input_transcription . text ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; import * as fs from "node:fs" ; import pkg from 'wavefile' ; const { WaveFile } = pkg ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const config = { responseModalities : [ Modality . TEXT ], inputAudioTranscription : {} }; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); // Send Audio Chunk const fileBuffer = fs . readFileSync ( "16000.wav" ); // Ensure audio conforms to API requirements (16-bit PCM, 16kHz, mono) const wav = new WaveFile (); wav . fromBuffer ( fileBuffer ); wav . toSampleRate ( 16000 ); wav . toBitDepth ( "16" ); const base64Audio = wav . toBase64 (); // If already in correct format, you can use this: // const fileBuffer = fs.readFileSync("sample.pcm"); // const base64Audio = Buffer.from(fileBuffer).toString('base64'); session . sendRealtimeInput ( { audio : { data : base64Audio , mimeType : "audio/pcm;rate=16000" } } ); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . outputTranscription ) { console . log ( "Transcription" ) console . log ( turn . serverContent . outputTranscription . text ); } } for ( const turn of turns ) { if ( turn . text ) { console . debug ( 'Received text: %s\n' , turn . text ); } else if ( turn . data ) { console . debug ( 'Received inline data: %s\n' , turn . data ); } else if ( turn . serverContent && turn . serverContent . inputTranscription ) { console . debug ( 'Received input transcription: %s\n' , turn . serverContent . inputTranscription . text ); } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); Stream audio and video To see an example of how to use the Live API in a streaming audio and video format, run the "Live API - Get Started" file in the cookbooks repository: View on Colab Change voice and language The Live API models each support a different set of voices. Half-cascade supports Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr. Native audio supports a much longer list (identical to the TTS model list ). You can listen to all the voices in AI Studio . To specify a voice, set the voice name within the speechConfig object as part of the session configuration: Python config = { "response_modalities" : [ "AUDIO" ], "speech_config" : { "voice_config" : { "prebuilt_voice_config" : { "voice_name" : "Kore" }} }, } JavaScript const config = { responseModalities : [ Modality . AUDIO ], speechConfig : { voiceConfig : { prebuiltVoiceConfig : { voiceName : "Kore" } } } }; Note: If you're using the generateContent API, the set of available voices is slightly different. See the audio generation guide for generateContent audio generation voices. The Live API supports multiple languages . To change the language, set the language code within the speechConfig object as part of the session configuration: Python config = { "response_modalities" : [ "AUDIO" ], "speech_config" : { "language_code" : "de-DE" } } JavaScript const config = { responseModalities : [ Modality . AUDIO ], speechConfig : { languageCode : "de-DE" } }; Note: Native audio output models automatically choose the appropriate language and don't support explicitly setting the language code. Native audio capabilities The following capabilities are only available with native audio. You can learn more about native audio in Choose a model and audio generation . Note: Native audio models currently have limited tool use support. See Overview of supported tools for details. How to use native audio output To use native audio output, configure one of the native audio models and set response_modalities to AUDIO . See Send and receive audio for a full example. Python model = "gemini-2.5-flash-preview-native-audio-dialog" config = types . LiveConnectConfig ( response_modalities = [ "AUDIO" ]) async with client . aio . live . connect ( model = model , config = config ) as session : # Send audio input and receive audio JavaScript const model = 'gemini-2.5-flash-preview-native-audio-dialog' ; const config = { responseModalities : [ Modality . AUDIO ] }; async function main () { const session = await ai . live . connect ({ model : model , config : config , callbacks : ..., }); // Send audio input and receive audio session . close (); } main (); Affective dialog This feature lets Gemini adapt its response style to the input expression and tone. To use affective dialog, set the api version to v1alpha and set enable_affective_dialog to true in the setup message: Python client = genai . Client ( http_options = { "api_version" : "v1alpha" }) config = types . LiveConnectConfig ( response_modalities = [ "AUDIO" ], enable_affective_dialog = True ) JavaScript const ai = new GoogleGenAI ({ httpOptions : { "apiVersion" : "v1alpha" } }); const config = { responseModalities : [ Modality . AUDIO ], enableAffectiveDialog : true }; Note that affective dialog is currently only supported by the native audio output models. Proactive audio When this feature is enabled, Gemini can proactively decide not to respond if the content is not relevant. To use it, set the api version to v1alpha and configure the proactivity field in the setup message and set proactive_audio to true : Python client = genai . Client ( http_options = { "api_version" : "v1alpha" }) config = types . LiveConnectConfig ( response_modalities = [ "AUDIO" ], proactivity = { 'proactive_audio' : True } ) JavaScript const ai = new GoogleGenAI ({ httpOptions : { "apiVersion" : "v1alpha" } }); const config = { responseModalities : [ Modality . AUDIO ], proactivity : { proactiveAudio : true } } Note that proactive audio is currently only supported by the native audio output models. Native audio output with thinking Native audio output supports thinking capabilities , available via a separate model gemini-2.5-flash-exp-native-audio-thinking-dialog . See Send and receive audio for a full example. Python model = "gemini-2.5-flash-exp-native-audio-thinking-dialog" config = types . LiveConnectConfig ( response_modalities = [ "AUDIO" ]) async with client . aio . live . connect ( model = model , config = config ) as session : # Send audio input and receive audio JavaScript const model = 'gemini-2.5-flash-exp-native-audio-thinking-dialog' ; const config = { responseModalities : [ Modality . AUDIO ] }; async function main () { const session = await ai . live . connect ({ model : model , config : config , callbacks : ..., }); // Send audio input and receive audio session . close (); } main (); Voice Activity Detection (VAD) Voice Activity Detection (VAD) allows the model to recognize when a person is speaking. This is essential for creating natural conversations, as it allows a user to interrupt the model at any time. When VAD detects an interruption, the ongoing generation is canceled and discarded. Only the information already sent to the client is retained in the session history. The server then sends a BidiGenerateContentServerContent message to report the interruption. The Gemini server then discards any pending function calls and sends a BidiGenerateContentServerContent message with the IDs of the canceled calls. Python async for response in session . receive (): if response . server_content . interrupted is True : # The generation was interrupted # If realtime playback is implemented in your application, # you should stop playing audio and clear queued playback here. JavaScript const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . interrupted ) { // The generation was interrupted // If realtime playback is implemented in your application, // you should stop playing audio and clear queued playback here. } } Automatic VAD By default, the model automatically performs VAD on a continuous audio input stream. VAD can be configured with the realtimeInputConfig.automaticActivityDetection field of the setup configuration . When the audio stream is paused for more than a second (for example, because the user switched off the microphone), an audioStreamEnd event should be sent to flush any cached audio. The client can resume sending audio data at any time. Python # example audio file to try: # URL = "https://storage.googleapis.com/generativeai-downloads/data/hello_are_you_there.pcm" # !wget -q $URL -O sample.pcm import asyncio from pathlib import Path from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" config = { "response_modalities" : [ "TEXT" ]} async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : audio_bytes = Path ( "sample.pcm" ) . read_bytes () await session . send_realtime_input ( audio = types . Blob ( data = audio_bytes , mime_type = "audio/pcm;rate=16000" ) ) # if stream gets paused, send: # await session.send_realtime_input(audio_stream_end=True) async for response in session . receive (): if response . text is not None : print ( response . text ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript // example audio file to try: // URL = "https://storage.googleapis.com/generativeai-downloads/data/hello_are_you_there.pcm" // !wget -q $URL -O sample.pcm import { GoogleGenAI , Modality } from '@google/genai' ; import * as fs from "node:fs" ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const config = { responseModalities : [ Modality . TEXT ] }; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); // Send Audio Chunk const fileBuffer = fs . readFileSync ( "sample.pcm" ); const base64Audio = Buffer . from ( fileBuffer ). toString ( 'base64' ); session . sendRealtimeInput ( { audio : { data : base64Audio , mimeType : "audio/pcm;rate=16000" } } ); // if stream gets paused, send: // session.sendRealtimeInput({ audioStreamEnd: true }) const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . text ) { console . debug ( 'Received text: %s\n' , turn . text ); } else if ( turn . data ) { console . debug ( 'Received inline data: %s\n' , turn . data ); } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); With send_realtime_input , the API will respond to audio automatically based on VAD. While send_client_content adds messages to the model context in order, send_realtime_input is optimized for responsiveness at the expense of deterministic ordering. Automatic VAD configuration For more control over the VAD activity, you can configure the following parameters. See API reference for more info. Python from google.genai import types config = { "response_modalities" : [ "TEXT" ], "realtime_input_config" : { "automatic_activity_detection" : { "disabled" : False , # default "start_of_speech_sensitivity" : types . StartSensitivity . START_SENSITIVITY_LOW , "end_of_speech_sensitivity" : types . EndSensitivity . END_SENSITIVITY_LOW , "prefix_padding_ms" : 20 , "silence_duration_ms" : 100 , } } } JavaScript import { GoogleGenAI , Modality , StartSensitivity , EndSensitivity } from '@google/genai' ; const config = { responseModalities : [ Modality . TEXT ], realtimeInputConfig : { automaticActivityDetection : { disabled : false , // default startOfSpeechSensitivity : StartSensitivity . START_SENSITIVITY_LOW , endOfSpeechSensitivity : EndSensitivity . END_SENSITIVITY_LOW , prefixPaddingMs : 20 , silenceDurationMs : 100 , } } }; Disable automatic VAD Alternatively, the automatic VAD can be disabled by setting realtimeInputConfig.automaticActivityDetection.disabled to true in the setup message. In this configuration the client is responsible for detecting user speech and sending activityStart and activityEnd messages at the appropriate times. An audioStreamEnd isn't sent in this configuration. Instead, any interruption of the stream is marked by an activityEnd message. Python config = { "response_modalities" : [ "TEXT" ], "realtime_input_config" : { "automatic_activity_detection" : { "disabled" : True }}, } async with client . aio . live . connect ( model = model , config = config ) as session : # ... await session . send_realtime_input ( activity_start = types . ActivityStart ()) await session . send_realtime_input ( audio = types . Blob ( data = audio_bytes , mime_type = "audio/pcm;rate=16000" ) ) await session . send_realtime_input ( activity_end = types . ActivityEnd ()) # ... JavaScript const config = { responseModalities : [ Modality . TEXT ], realtimeInputConfig : { automaticActivityDetection : { disabled : true , } } }; session . sendRealtimeInput ({ activityStart : {} }) session . sendRealtimeInput ( { audio : { data : base64Audio , mimeType : "audio/pcm;rate=16000" } } ); session . sendRealtimeInput ({ activityEnd : {} }) Token count You can find the total number of consumed tokens in the usageMetadata field of the returned server message. Python async for message in session . receive (): # The server will periodically send messages that include UsageMetadata. if message . usage_metadata : usage = message . usage_metadata print ( f "Used { usage . total_token_count } tokens in total. Response token breakdown:" ) for detail in usage . response_tokens_details : match detail : case types . ModalityTokenCount ( modality = modality , token_count = count ): print ( f " { modality } : { count } " ) JavaScript const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . usageMetadata ) { console . debug ( 'Used %s tokens in total. Response token breakdown:\n' , turn . usageMetadata . totalTokenCount ); for ( const detail of turn . usageMetadata . responseTokensDetails ) { console . debug ( '%s\n' , detail ); } } } Media resolution You can specify the media resolution for the input media by setting the mediaResolution field as part of the session configuration: Python from google.genai import types config = { "response_modalities" : [ "AUDIO" ], "media_resolution" : types . MediaResolution . MEDIA_RESOLUTION_LOW , } JavaScript import { GoogleGenAI , Modality , MediaResolution } from '@google/genai' ; const config = { responseModalities : [ Modality . TEXT ], mediaResolution : MediaResolution . MEDIA_RESOLUTION_LOW , }; Limitations Consider the following limitations of the Live API when you plan your project. Response modalities You can only set one response modality ( TEXT or AUDIO ) per session in the session configuration. Setting both results in a config error message. This means that you can configure the model to respond with either text or audio, but not both in the same session. Client authentication The Live API only provides server-to-server authentication by default. If you're implementing your Live API application using a client-to-server approach , you need to use ephemeral tokens to mitigate security risks. Session duration Audio-only sessions are limited to 15 minutes, and audio plus video sessions are limited to 2 minutes. However, you can configure different session management techniques for unlimited extensions on session duration. Context window A session has a context window limit of: 128k tokens for native audio output models 32k tokens for other Live API models Supported languages Live API supports the following languages. Note: Native audio output models automatically choose the appropriate language and don't support explicitly setting the language code. Language BCP-47 Code Language BCP-47 Code German (Germany) de-DE English (Australia)* en-AU English (UK)* en-GB English (India) en-IN English (US) en-US Spanish (US) es-US French (France) fr-FR Hindi (India) hi-IN Portuguese (Brazil) pt-BR Arabic (Generic) ar-XA Spanish (Spain)* es-ES French (Canada)* fr-CA Indonesian (Indonesia) id-ID Italian (Italy) it-IT Japanese (Japan) ja-JP Turkish (Turkey) tr-TR Vietnamese (Vietnam) vi-VN Bengali (India) bn-IN Gujarati (India)* gu-IN Kannada (India)* kn-IN Marathi (India) mr-IN Malayalam (India)* ml-IN Tamil (India) ta-IN Telugu (India) te-IN Dutch (Netherlands) nl-NL Korean (South Korea) ko-KR Mandarin Chinese (China)* cmn-CN Polish (Poland) pl-PL Russian (Russia) ru-RU Thai (Thailand) th-TH Languages marked with an asterisk (*) are not available for Native audio . What's next Read the Tool Use and Session Management guides for essential information on using the Live API effectively. Try the Live API in Google AI Studio . For more info about the Live API models, see Gemini 2.0 Flash Live and Gemini 2.5 Flash Native Audio on the Models page. Try more examples in the Live API cookbook , the Live API Tools cookbook , and the Live API Get Started script . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-08 UTC.
|
text_content/docs_live-tools_141b2194.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/live-tools
|
| 2 |
+
Title: Tool use with Live API | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Tool use with Live API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Tool use with Live API Tool use allows Live API to go beyond just conversation by enabling it to perform actions in the real-world and pull in external context while maintaining a real time connection. You can define tools such as Function calling , Code execution , and Google Search with the Live API. Overview of supported tools Here's a brief overview of the available tools for each model: Tool Cascaded models gemini-live-2.5-flash-preview gemini-2.0-flash-live-001 gemini-2.5-flash-preview-native-audio-dialog gemini-2.5-flash-exp-native-audio-thinking-dialog Search Yes Yes Yes Function calling Yes Yes No Code execution Yes No No Url context Yes No No Function calling Live API supports function calling, just like regular content generation requests. Function calling lets the Live API interact with external data and programs, greatly increasing what your applications can accomplish. You can define function declarations as part of the session configuration. After receiving tool calls, the client should respond with a list of FunctionResponse objects using the session.send_tool_response method. See the Function calling tutorial to learn more. Note: Unlike the generateContent API, the Live API doesn't support automatic tool response handling. You must handle tool responses manually in your client code. Python import asyncio from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" # Simple function definitions turn_on_the_lights = { "name" : "turn_on_the_lights" } turn_off_the_lights = { "name" : "turn_off_the_lights" } tools = [{ "function_declarations" : [ turn_on_the_lights , turn_off_the_lights ]}] config = { "response_modalities" : [ "TEXT" ], "tools" : tools } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : prompt = "Turn on the lights please" await session . send_client_content ( turns = { "parts" : [{ "text" : prompt }]}) async for chunk in session . receive (): if chunk . server_content : if chunk . text is not None : print ( chunk . text ) elif chunk . tool_call : function_responses = [] for fc in chunk . tool_call . function_calls : function_response = types . FunctionResponse ( id = fc . id , name = fc . name , response = { "result" : "ok" } # simple, hard-coded function response ) function_responses . append ( function_response ) await session . send_tool_response ( function_responses = function_responses ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; // Simple function definitions const turn_on_the_lights = { name : "turn_on_the_lights" } // , description: '...', parameters: { ... } const turn_off_the_lights = { name : "turn_off_the_lights" } const tools = [{ functionDeclarations : [ turn_on_the_lights , turn_off_the_lights ] }] const config = { responseModalities : [ Modality . TEXT ], tools : tools } async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } else if ( message . toolCall ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'Turn on the lights please' ; session . sendClientContent ({ turns : inputTurns }); let turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . modelTurn && turn . serverContent . modelTurn . parts ) { for ( const part of turn . serverContent . modelTurn . parts ) { if ( part . text ) { console . debug ( 'Received text: %s\n' , part . text ); } } } else if ( turn . toolCall ) { const functionResponses = []; for ( const fc of turn . toolCall . functionCalls ) { functionResponses . push ({ id : fc . id , name : fc . name , response : { result : "ok" } // simple, hard-coded function response }); } console . debug ( 'Sending tool response...\n' ); session . sendToolResponse ({ functionResponses : functionResponses }); } } // Check again for new messages turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . modelTurn && turn . serverContent . modelTurn . parts ) { for ( const part of turn . serverContent . modelTurn . parts ) { if ( part . text ) { console . debug ( 'Received text: %s\n' , part . text ); } } } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); From a single prompt, the model can generate multiple function calls and the code necessary to chain their outputs. This code executes in a sandbox environment, generating subsequent BidiGenerateContentToolCall messages. Asynchronous function calling Note: Asynchronous function calling is only supported in half-cascade audio generation. Function calling executes sequentially by default, meaning execution pauses until the results of each function call are available. This ensures sequential processing, which means you won't be able to continue interacting with the model while the functions are being run. If you don't want to block the conversation, you can tell the model to run the functions asynchronously. To do so, you first need to add a behavior to the function definitions: Python # Non-blocking function definitions turn_on_the_lights = { "name" : "turn_on_the_lights" , "behavior" : "NON_BLOCKING" } # turn_on_the_lights will run asynchronously turn_off_the_lights = { "name" : "turn_off_the_lights" } # turn_off_the_lights will still pause all interactions with the model JavaScript import { GoogleGenAI , Modality , Behavior } from '@google/genai' ; // Non-blocking function definitions const turn_on_the_lights = { name : "turn_on_the_lights" , behavior : Behavior . NON_BLOCKING } // Blocking function definitions const turn_off_the_lights = { name : "turn_off_the_lights" } const tools = [{ functionDeclarations : [ turn_on_the_lights , turn_off_the_lights ] }] NON-BLOCKING ensures the function runs asynchronously while you can continue interacting with the model. Then you need to tell the model how to behave when it receives the FunctionResponse using the scheduling parameter. It can either: Interrupt what it's doing and tell you about the response it got right away ( scheduling="INTERRUPT" ), Wait until it's finished with what it's currently doing ( scheduling="WHEN_IDLE" ), Or do nothing and use that knowledge later on in the discussion ( scheduling="SILENT" ) Python # for a non-blocking function definition, apply scheduling in the function response: function_response = types . FunctionResponse ( id = fc . id , name = fc . name , response = { "result" : "ok" , "scheduling" : "INTERRUPT" # Can also be WHEN_IDLE or SILENT } ) JavaScript import { GoogleGenAI , Modality , Behavior , FunctionResponseScheduling } from '@google/genai' ; // for a non-blocking function definition, apply scheduling in the function response: const functionResponse = { id : fc . id , name : fc . name , response : { result : "ok" , scheduling : FunctionResponseScheduling . INTERRUPT // Can also be WHEN_IDLE or SILENT } } Code execution You can define code execution as part of the session configuration. This lets the Live API generate and execute Python code and dynamically perform computations to benefit your results. See the Code execution tutorial to learn more. Python import asyncio from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" tools = [{ 'code_execution' : {}}] config = { "response_modalities" : [ "TEXT" ], "tools" : tools } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : prompt = "Compute the largest prime palindrome under 100000." await session . send_client_content ( turns = { "parts" : [{ "text" : prompt }]}) async for chunk in session . receive (): if chunk . server_content : if chunk . text is not None : print ( chunk . text ) model_turn = chunk . server_content . model_turn if model_turn : for part in model_turn . parts : if part . executable_code is not None : print ( part . executable_code . code ) if part . code_execution_result is not None : print ( part . code_execution_result . output ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const tools = [{ codeExecution : {}}] const config = { responseModalities : [ Modality . TEXT ], tools : tools } async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } else if ( message . toolCall ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'Compute the largest prime palindrome under 100000.' ; session . sendClientContent ({ turns : inputTurns }); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . modelTurn && turn . serverContent . modelTurn . parts ) { for ( const part of turn . serverContent . modelTurn . parts ) { if ( part . text ) { console . debug ( 'Received text: %s\n' , part . text ); } else if ( part . executableCode ) { console . debug ( 'executableCode: %s\n' , part . executableCode . code ); } else if ( part . codeExecutionResult ) { console . debug ( 'codeExecutionResult: %s\n' , part . codeExecutionResult . output ); } } } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); Grounding with Google Search You can enable Grounding with Google Search as part of the session configuration. This increases the Live API's accuracy and prevents hallucinations. See the Grounding tutorial to learn more. Python import asyncio from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" tools = [{ 'google_search' : {}}] config = { "response_modalities" : [ "TEXT" ], "tools" : tools } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : prompt = "When did the last Brazil vs. Argentina soccer match happen?" await session . send_client_content ( turns = { "parts" : [{ "text" : prompt }]}) async for chunk in session . receive (): if chunk . server_content : if chunk . text is not None : print ( chunk . text ) # The model might generate and execute Python code to use Search model_turn = chunk . server_content . model_turn if model_turn : for part in model_turn . parts : if part . executable_code is not None : print ( part . executable_code . code ) if part . code_execution_result is not None : print ( part . code_execution_result . output ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const tools = [{ googleSearch : {}}] const config = { responseModalities : [ Modality . TEXT ], tools : tools } async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } else if ( message . toolCall ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'When did the last Brazil vs. Argentina soccer match happen?' ; session . sendClientContent ({ turns : inputTurns }); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . modelTurn && turn . serverContent . modelTurn . parts ) { for ( const part of turn . serverContent . modelTurn . parts ) { if ( part . text ) { console . debug ( 'Received text: %s\n' , part . text ); } else if ( part . executableCode ) { console . debug ( 'executableCode: %s\n' , part . executableCode . code ); } else if ( part . codeExecutionResult ) { console . debug ( 'codeExecutionResult: %s\n' , part . codeExecutionResult . output ); } } } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); Combining multiple tools You can combine multiple tools within the Live API, increasing your application's capabilities even more: Python prompt = """ Hey, I need you to do three things for me. 1. Compute the largest prime palindrome under 100000. 2. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024? 3. Turn on the lights Thanks! """ tools = [ { "google_search" : {}}, { "code_execution" : {}}, { "function_declarations" : [ turn_on_the_lights , turn_off_the_lights ]}, ] config = { "response_modalities" : [ "TEXT" ], "tools" : tools } # ... remaining model call JavaScript const prompt = `Hey, I need you to do three things for me. 1. Compute the largest prime palindrome under 100000. 2. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024? 3. Turn on the lights Thanks! ` const tools = [ { googleSearch : {} }, { codeExecution : {} }, { functionDeclarations : [ turn_on_the_lights , turn_off_the_lights ] } ] const config = { responseModalities : [ Modality . TEXT ], tools : tools } // ... remaining model call What's next Check out more examples of using tools with the Live API in the Tool use cookbook . Get the full story on features and configurations from the Live API Capabilities guide . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-08 UTC.
|
text_content/docs_live_1db6d544.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/live#main-content
|
| 2 |
+
Title: Get started with Live API | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Get started with Live API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Get started with Live API Preview: The Live API is in preview. The Live API enables low-latency, real-time voice and video interactions with Gemini. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses, creating a natural conversational experience for your users. Live API offers a comprehensive set of features such as Voice Activity Detection , tool use and function calling , session management (for managing long running conversations) and ephemeral tokens (for secure client-sided authentication). This page gets you up and running with examples and basic code samples. Example applications Check out the following example applications that illustrate how to use Live API for end-to-end use cases: Live audio starter app on AI Studio, using JavaScript libraries to connect to Live API and stream bidirectional audio through your microphone and speakers. Live API Python cookbook using Pyaudio that connects to Live API. Partner integrations If you prefer a simpler development process, you can use Daily or LiveKit . These are third-party partner platforms that have already integrated the Gemini Live API over the WebRTC protocol to streamline the development of real-time audio and video applications. Before you begin building There are two important decisions to make before you begin building with the Live API: choosing a model and choosing an implementation approach. Choose an audio generation architecture If you're building an audio-based use case, your choice of model determines the audio generation architecture used to create the audio response: Native audio : This option provides the most natural and realistic-sounding speech and better multilingual performance. It also enables advanced features like affective (emotion-aware) dialogue , proactive audio (where the model can decide to ignore or respond to certain inputs), and "thinking" . Native audio is supported by the following native audio models : gemini-2.5-flash-preview-native-audio-dialog gemini-2.5-flash-exp-native-audio-thinking-dialog Half-cascade audio : This option uses a cascaded model architecture (native audio input and text-to-speech output). It offers better performance and reliability in production environments, especially with tool use . Half-cascaded audio is supported by the following models: gemini-live-2.5-flash-preview gemini-2.0-flash-live-001 Choose an implementation approach When integrating with Live API, you'll need to choose one of the following implementation approaches: Server-to-server : Your backend connects to the Live API using WebSockets . Typically, your client sends stream data (audio, video, text) to your server, which then forwards it to the Live API. Client-to-server : Your frontend code connects directly to the Live API using WebSockets to stream data, bypassing your backend. Note: Client-to-server generally offers better performance for streaming audio and video, since it bypasses the need to send the stream to your backend first. It's also easier to set up since you don't need to implement a proxy that sends data from your client to your server and then your server to the API. However, for production environments, in order to mitigate security risks, we recommend using ephemeral tokens instead of standard API keys. Get started This example reads a WAV file , sends it in the correct format, and saves the received data as WAV file. You can send audio by converting it to 16-bit PCM, 16kHz, mono format, and you can receive audio by setting AUDIO as response modality. The output uses a sample rate of 24kHz. Python # Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav # Install helpers for converting files: pip install librosa soundfile import asyncio import io from pathlib import Path import wave from google import genai from google.genai import types import soundfile as sf import librosa client = genai . Client () # Half cascade model: # model = "gemini-live-2.5-flash-preview" # Native audio output model: model = "gemini-2.5-flash-preview-native-audio-dialog" config = { "response_modalities" : [ "AUDIO" ], "system_instruction" : "You are a helpful assistant and answer in a friendly tone." , } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : buffer = io . BytesIO () y , sr = librosa . load ( "sample.wav" , sr = 16000 ) sf . write ( buffer , y , sr , format = 'RAW' , subtype = 'PCM_16' ) buffer . seek ( 0 ) audio_bytes = buffer . read () # If already in correct format, you can use this: # audio_bytes = Path("sample.pcm").read_bytes() await session . send_realtime_input ( audio = types . Blob ( data = audio_bytes , mime_type = "audio/pcm;rate=16000" ) ) wf = wave . open ( "audio.wav" , "wb" ) wf . setnchannels ( 1 ) wf . setsampwidth ( 2 ) wf . setframerate ( 24000 ) # Output is 24kHz async for response in session . receive (): if response . data is not None : wf . writeframes ( response . data ) # Un-comment this code to print audio data info # if response.server_content.model_turn is not None: # print(response.server_content.model_turn.parts[0].inline_data.mime_type) wf . close () if __name__ == "__main__" : asyncio . run ( main ()) JavaScript // Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav import { GoogleGenAI , Modality } from '@google/genai' ; import * as fs from "node:fs" ; import pkg from 'wavefile' ; // npm install wavefile const { WaveFile } = pkg ; const ai = new GoogleGenAI ({}); // WARNING: Do not use API keys in client-side (browser based) applications // Consider using Ephemeral Tokens instead // More information at: https://ai.google.dev/gemini-api/docs/ephemeral-tokens // Half cascade model: // const model = "gemini-live-2.5-flash-preview" // Native audio output model: const model = "gemini-2.5-flash-preview-native-audio-dialog" const config = { responseModalities : [ Modality . AUDIO ], systemInstruction : "You are a helpful assistant and answer in a friendly tone." }; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); // Send Audio Chunk const fileBuffer = fs . readFileSync ( "sample.wav" ); // Ensure audio conforms to API requirements (16-bit PCM, 16kHz, mono) const wav = new WaveFile (); wav . fromBuffer ( fileBuffer ); wav . toSampleRate ( 16000 ); wav . toBitDepth ( "16" ); const base64Audio = wav . toBase64 (); // If already in correct format, you can use this: // const fileBuffer = fs.readFileSync("sample.pcm"); // const base64Audio = Buffer.from(fileBuffer).toString('base64'); session . sendRealtimeInput ( { audio : { data : base64Audio , mimeType : "audio/pcm;rate=16000" } } ); const turns = await handleTurn (); // Combine audio data strings and save as wave file const combinedAudio = turns . reduce (( acc , turn ) = > { if ( turn . data ) { const buffer = Buffer . from ( turn . data , 'base64' ); const intArray = new Int16Array ( buffer . buffer , buffer . byteOffset , buffer . byteLength / Int16Array . BYTES_PER_ELEMENT ); return acc . concat ( Array . from ( intArray )); } return acc ; }, []); const audioBuffer = new Int16Array ( combinedAudio ); const wf = new WaveFile (); wf . fromScratch ( 1 , 24000 , '16' , audioBuffer ); // output is 24kHz fs . writeFileSync ( 'audio.wav' , wf . toBuffer ()); session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); What's next Read the full Live API Capabilities guide for key capabilities and configurations; including Voice Activity Detection and native audio features. Read the Tool use guide to learn how to integrate Live API with tools and function calling. Read the Session management guide for managing long running conversations. Read the Ephemeral tokens guide for secure authentication in client-to-server applications. For more information about the underlying WebSockets API, see the WebSockets API reference . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-08 UTC.
|
text_content/docs_live_9c2b1926.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/live#response-modalities
|
| 2 |
+
Title: Get started with Live API | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Get started with Live API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Get started with Live API Preview: The Live API is in preview. The Live API enables low-latency, real-time voice and video interactions with Gemini. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses, creating a natural conversational experience for your users. Live API offers a comprehensive set of features such as Voice Activity Detection , tool use and function calling , session management (for managing long running conversations) and ephemeral tokens (for secure client-sided authentication). This page gets you up and running with examples and basic code samples. Example applications Check out the following example applications that illustrate how to use Live API for end-to-end use cases: Live audio starter app on AI Studio, using JavaScript libraries to connect to Live API and stream bidirectional audio through your microphone and speakers. Live API Python cookbook using Pyaudio that connects to Live API. Partner integrations If you prefer a simpler development process, you can use Daily or LiveKit . These are third-party partner platforms that have already integrated the Gemini Live API over the WebRTC protocol to streamline the development of real-time audio and video applications. Before you begin building There are two important decisions to make before you begin building with the Live API: choosing a model and choosing an implementation approach. Choose an audio generation architecture If you're building an audio-based use case, your choice of model determines the audio generation architecture used to create the audio response: Native audio : This option provides the most natural and realistic-sounding speech and better multilingual performance. It also enables advanced features like affective (emotion-aware) dialogue , proactive audio (where the model can decide to ignore or respond to certain inputs), and "thinking" . Native audio is supported by the following native audio models : gemini-2.5-flash-preview-native-audio-dialog gemini-2.5-flash-exp-native-audio-thinking-dialog Half-cascade audio : This option uses a cascaded model architecture (native audio input and text-to-speech output). It offers better performance and reliability in production environments, especially with tool use . Half-cascaded audio is supported by the following models: gemini-live-2.5-flash-preview gemini-2.0-flash-live-001 Choose an implementation approach When integrating with Live API, you'll need to choose one of the following implementation approaches: Server-to-server : Your backend connects to the Live API using WebSockets . Typically, your client sends stream data (audio, video, text) to your server, which then forwards it to the Live API. Client-to-server : Your frontend code connects directly to the Live API using WebSockets to stream data, bypassing your backend. Note: Client-to-server generally offers better performance for streaming audio and video, since it bypasses the need to send the stream to your backend first. It's also easier to set up since you don't need to implement a proxy that sends data from your client to your server and then your server to the API. However, for production environments, in order to mitigate security risks, we recommend using ephemeral tokens instead of standard API keys. Get started This example reads a WAV file , sends it in the correct format, and saves the received data as WAV file. You can send audio by converting it to 16-bit PCM, 16kHz, mono format, and you can receive audio by setting AUDIO as response modality. The output uses a sample rate of 24kHz. Python # Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav # Install helpers for converting files: pip install librosa soundfile import asyncio import io from pathlib import Path import wave from google import genai from google.genai import types import soundfile as sf import librosa client = genai . Client () # Half cascade model: # model = "gemini-live-2.5-flash-preview" # Native audio output model: model = "gemini-2.5-flash-preview-native-audio-dialog" config = { "response_modalities" : [ "AUDIO" ], "system_instruction" : "You are a helpful assistant and answer in a friendly tone." , } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : buffer = io . BytesIO () y , sr = librosa . load ( "sample.wav" , sr = 16000 ) sf . write ( buffer , y , sr , format = 'RAW' , subtype = 'PCM_16' ) buffer . seek ( 0 ) audio_bytes = buffer . read () # If already in correct format, you can use this: # audio_bytes = Path("sample.pcm").read_bytes() await session . send_realtime_input ( audio = types . Blob ( data = audio_bytes , mime_type = "audio/pcm;rate=16000" ) ) wf = wave . open ( "audio.wav" , "wb" ) wf . setnchannels ( 1 ) wf . setsampwidth ( 2 ) wf . setframerate ( 24000 ) # Output is 24kHz async for response in session . receive (): if response . data is not None : wf . writeframes ( response . data ) # Un-comment this code to print audio data info # if response.server_content.model_turn is not None: # print(response.server_content.model_turn.parts[0].inline_data.mime_type) wf . close () if __name__ == "__main__" : asyncio . run ( main ()) JavaScript // Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav import { GoogleGenAI , Modality } from '@google/genai' ; import * as fs from "node:fs" ; import pkg from 'wavefile' ; // npm install wavefile const { WaveFile } = pkg ; const ai = new GoogleGenAI ({}); // WARNING: Do not use API keys in client-side (browser based) applications // Consider using Ephemeral Tokens instead // More information at: https://ai.google.dev/gemini-api/docs/ephemeral-tokens // Half cascade model: // const model = "gemini-live-2.5-flash-preview" // Native audio output model: const model = "gemini-2.5-flash-preview-native-audio-dialog" const config = { responseModalities : [ Modality . AUDIO ], systemInstruction : "You are a helpful assistant and answer in a friendly tone." }; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); // Send Audio Chunk const fileBuffer = fs . readFileSync ( "sample.wav" ); // Ensure audio conforms to API requirements (16-bit PCM, 16kHz, mono) const wav = new WaveFile (); wav . fromBuffer ( fileBuffer ); wav . toSampleRate ( 16000 ); wav . toBitDepth ( "16" ); const base64Audio = wav . toBase64 (); // If already in correct format, you can use this: // const fileBuffer = fs.readFileSync("sample.pcm"); // const base64Audio = Buffer.from(fileBuffer).toString('base64'); session . sendRealtimeInput ( { audio : { data : base64Audio , mimeType : "audio/pcm;rate=16000" } } ); const turns = await handleTurn (); // Combine audio data strings and save as wave file const combinedAudio = turns . reduce (( acc , turn ) = > { if ( turn . data ) { const buffer = Buffer . from ( turn . data , 'base64' ); const intArray = new Int16Array ( buffer . buffer , buffer . byteOffset , buffer . byteLength / Int16Array . BYTES_PER_ELEMENT ); return acc . concat ( Array . from ( intArray )); } return acc ; }, []); const audioBuffer = new Int16Array ( combinedAudio ); const wf = new WaveFile (); wf . fromScratch ( 1 , 24000 , '16' , audioBuffer ); // output is 24kHz fs . writeFileSync ( 'audio.wav' , wf . toBuffer ()); session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); What's next Read the full Live API Capabilities guide for key capabilities and configurations; including Voice Activity Detection and native audio features. Read the Tool use guide to learn how to integrate Live API with tools and function calling. Read the Session management guide for managing long running conversations. Read the Ephemeral tokens guide for secure authentication in client-to-server applications. For more information about the underlying WebSockets API, see the WebSockets API reference . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-08 UTC.
|
text_content/docs_live_e7a4e219.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/live#implementation-approach
|
| 2 |
+
Title: Get started with Live API | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Get started with Live API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Get started with Live API Preview: The Live API is in preview. The Live API enables low-latency, real-time voice and video interactions with Gemini. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses, creating a natural conversational experience for your users. Live API offers a comprehensive set of features such as Voice Activity Detection , tool use and function calling , session management (for managing long running conversations) and ephemeral tokens (for secure client-sided authentication). This page gets you up and running with examples and basic code samples. Example applications Check out the following example applications that illustrate how to use Live API for end-to-end use cases: Live audio starter app on AI Studio, using JavaScript libraries to connect to Live API and stream bidirectional audio through your microphone and speakers. Live API Python cookbook using Pyaudio that connects to Live API. Partner integrations If you prefer a simpler development process, you can use Daily or LiveKit . These are third-party partner platforms that have already integrated the Gemini Live API over the WebRTC protocol to streamline the development of real-time audio and video applications. Before you begin building There are two important decisions to make before you begin building with the Live API: choosing a model and choosing an implementation approach. Choose an audio generation architecture If you're building an audio-based use case, your choice of model determines the audio generation architecture used to create the audio response: Native audio : This option provides the most natural and realistic-sounding speech and better multilingual performance. It also enables advanced features like affective (emotion-aware) dialogue , proactive audio (where the model can decide to ignore or respond to certain inputs), and "thinking" . Native audio is supported by the following native audio models : gemini-2.5-flash-preview-native-audio-dialog gemini-2.5-flash-exp-native-audio-thinking-dialog Half-cascade audio : This option uses a cascaded model architecture (native audio input and text-to-speech output). It offers better performance and reliability in production environments, especially with tool use . Half-cascaded audio is supported by the following models: gemini-live-2.5-flash-preview gemini-2.0-flash-live-001 Choose an implementation approach When integrating with Live API, you'll need to choose one of the following implementation approaches: Server-to-server : Your backend connects to the Live API using WebSockets . Typically, your client sends stream data (audio, video, text) to your server, which then forwards it to the Live API. Client-to-server : Your frontend code connects directly to the Live API using WebSockets to stream data, bypassing your backend. Note: Client-to-server generally offers better performance for streaming audio and video, since it bypasses the need to send the stream to your backend first. It's also easier to set up since you don't need to implement a proxy that sends data from your client to your server and then your server to the API. However, for production environments, in order to mitigate security risks, we recommend using ephemeral tokens instead of standard API keys. Get started This example reads a WAV file , sends it in the correct format, and saves the received data as WAV file. You can send audio by converting it to 16-bit PCM, 16kHz, mono format, and you can receive audio by setting AUDIO as response modality. The output uses a sample rate of 24kHz. Python # Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav # Install helpers for converting files: pip install librosa soundfile import asyncio import io from pathlib import Path import wave from google import genai from google.genai import types import soundfile as sf import librosa client = genai . Client () # Half cascade model: # model = "gemini-live-2.5-flash-preview" # Native audio output model: model = "gemini-2.5-flash-preview-native-audio-dialog" config = { "response_modalities" : [ "AUDIO" ], "system_instruction" : "You are a helpful assistant and answer in a friendly tone." , } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : buffer = io . BytesIO () y , sr = librosa . load ( "sample.wav" , sr = 16000 ) sf . write ( buffer , y , sr , format = 'RAW' , subtype = 'PCM_16' ) buffer . seek ( 0 ) audio_bytes = buffer . read () # If already in correct format, you can use this: # audio_bytes = Path("sample.pcm").read_bytes() await session . send_realtime_input ( audio = types . Blob ( data = audio_bytes , mime_type = "audio/pcm;rate=16000" ) ) wf = wave . open ( "audio.wav" , "wb" ) wf . setnchannels ( 1 ) wf . setsampwidth ( 2 ) wf . setframerate ( 24000 ) # Output is 24kHz async for response in session . receive (): if response . data is not None : wf . writeframes ( response . data ) # Un-comment this code to print audio data info # if response.server_content.model_turn is not None: # print(response.server_content.model_turn.parts[0].inline_data.mime_type) wf . close () if __name__ == "__main__" : asyncio . run ( main ()) JavaScript // Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav import { GoogleGenAI , Modality } from '@google/genai' ; import * as fs from "node:fs" ; import pkg from 'wavefile' ; // npm install wavefile const { WaveFile } = pkg ; const ai = new GoogleGenAI ({}); // WARNING: Do not use API keys in client-side (browser based) applications // Consider using Ephemeral Tokens instead // More information at: https://ai.google.dev/gemini-api/docs/ephemeral-tokens // Half cascade model: // const model = "gemini-live-2.5-flash-preview" // Native audio output model: const model = "gemini-2.5-flash-preview-native-audio-dialog" const config = { responseModalities : [ Modality . AUDIO ], systemInstruction : "You are a helpful assistant and answer in a friendly tone." }; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); // Send Audio Chunk const fileBuffer = fs . readFileSync ( "sample.wav" ); // Ensure audio conforms to API requirements (16-bit PCM, 16kHz, mono) const wav = new WaveFile (); wav . fromBuffer ( fileBuffer ); wav . toSampleRate ( 16000 ); wav . toBitDepth ( "16" ); const base64Audio = wav . toBase64 (); // If already in correct format, you can use this: // const fileBuffer = fs.readFileSync("sample.pcm"); // const base64Audio = Buffer.from(fileBuffer).toString('base64'); session . sendRealtimeInput ( { audio : { data : base64Audio , mimeType : "audio/pcm;rate=16000" } } ); const turns = await handleTurn (); // Combine audio data strings and save as wave file const combinedAudio = turns . reduce (( acc , turn ) = > { if ( turn . data ) { const buffer = Buffer . from ( turn . data , 'base64' ); const intArray = new Int16Array ( buffer . buffer , buffer . byteOffset , buffer . byteLength / Int16Array . BYTES_PER_ELEMENT ); return acc . concat ( Array . from ( intArray )); } return acc ; }, []); const audioBuffer = new Int16Array ( combinedAudio ); const wf = new WaveFile (); wf . fromScratch ( 1 , 24000 , '16' , audioBuffer ); // output is 24kHz fs . writeFileSync ( 'audio.wav' , wf . toBuffer ()); session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); What's next Read the full Live API Capabilities guide for key capabilities and configurations; including Voice Activity Detection and native audio features. Read the Tool use guide to learn how to integrate Live API with tools and function calling. Read the Session management guide for managing long running conversations. Read the Ephemeral tokens guide for secure authentication in client-to-server applications. For more information about the underlying WebSockets API, see the WebSockets API reference . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-08 UTC.
|
text_content/docs_migrate_8812abe1.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/migrate#main-content
|
| 2 |
+
Title: Migrate to the Google GenAI SDK | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Migrate to the Google GenAI SDK | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Migrate to the Google GenAI SDK Starting with the Gemini 2.0 release in late 2024, we introduced a new set of libraries called the Google GenAI SDK . It offers an improved developer experience through an updated client architecture , and simplifies the transition between developer and enterprise workflows. The Google GenAI SDK is now in General Availability (GA) across all supported platforms. If you're using one of our legacy libraries , we strongly recommend you to migrate. This guide provides before-and-after examples of migrated code to help you get started. Note: The Go examples omit imports and other boilerplate code to improve readability. Installation Before Python pip install -U -q "google-generativeai" JavaScript npm install @google/generative-ai Go go get github.com/google/generative-ai-go After Python pip install -U -q "google-genai" JavaScript npm install @google/genai Go go get google.golang.org/genai API access The old SDK implicitly handled the API client behind the scenes using a variety of ad hoc methods. This made it hard to manage the client and credentials. Now, you interact through a central Client object. This Client object acts as a single entry point for various API services (e.g., models , chats , files , tunings ), promoting consistency and simplifying credential and configuration management across different API calls. Before (Less Centralized API Access) Python The old SDK didn't explicitly use a top-level client object for most API calls. You would directly instantiate and interact with GenerativeModel objects. import google.generativeai as genai # Directly create and use model objects model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . generate_content ( ... ) chat = model . start_chat ( ... ) JavaScript While GoogleGenerativeAI was a central point for models and chat, other functionalities like file and cache management often required importing and instantiating entirely separate client classes. import { GoogleGenerativeAI } from "@google/generative-ai" ; import { GoogleAIFileManager , GoogleAICacheManager } from "@google/generative-ai/server" ; // For files/caching const genAI = new GoogleGenerativeAI ( "YOUR_API_KEY" ); const fileManager = new GoogleAIFileManager ( "YOUR_API_KEY" ); const cacheManager = new GoogleAICacheManager ( "YOUR_API_KEY" ); // Get a model instance, then call methods on it const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" }); const result = await model . generateContent (...); const chat = model . startChat (...); // Call methods on separate client objects for other services const uploadedFile = await fileManager . uploadFile (...); const cache = await cacheManager . create (...); Go The genai.NewClient function created a client, but generative model operations were typically called on a separate GenerativeModel instance obtained from this client. Other services might have been accessed via distinct packages or patterns. import ( "github.com/google/generative-ai-go/genai" "github.com/google/generative-ai-go/genai/fileman" // For files "google.golang.org/api/option" ) client , err := genai . NewClient ( ctx , option . WithAPIKey ( "YOUR_API_KEY" )) fileClient , err := fileman . NewClient ( ctx , option . WithAPIKey ( "YOUR_API_KEY" )) // Get a model instance, then call methods on it model := client . GenerativeModel ( "gemini-1.5-flash" ) resp , err := model . GenerateContent ( ... ) cs := model . StartChat () // Call methods on separate client objects for other services uploadedFile , err := fileClient . UploadFile ( ... ) After (Centralized Client Object) Python from google import genai # Create a single client object client = genai . Client () # Access API methods through services on the client object response = client . models . generate_content ( ... ) chat = client . chats . create ( ... ) my_file = client . files . upload ( ... ) tuning_job = client . tunings . tune ( ... ) JavaScript import { GoogleGenAI } from "@google/genai" ; // Create a single client object const ai = new GoogleGenAI ({ apiKey : "YOUR_API_KEY" }); // Access API methods through services on the client object const response = await ai . models . generateContent (...); const chat = ai . chats . create (...); const uploadedFile = await ai . files . upload (...); const cache = await ai . caches . create (...); Go import "google.golang.org/genai" // Create a single client object client , err := genai . NewClient ( ctx , nil ) // Access API methods through services on the client object result , err := client . Models . GenerateContent ( ... ) chat , err := client . Chats . Create ( ... ) uploadedFile , err := client . Files . Upload ( ... ) tuningJob , err := client . Tunings . Tune ( ... ) Authentication Both legacy and new libraries authenticate using API keys. You can create your API key in Google AI Studio. Before Python The old SDK handled the API client object implicitly. import google.generativeai as genai genai . configure ( api_key =... ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); Go Import the Google libraries: import ( "github.com/google/generative-ai-go/genai" "google.golang.org/api/option" ) Create the client: client , err := genai . NewClient ( ctx , option . WithAPIKey ( "GOOGLE_API_KEY" )) After Python With Google GenAI SDK, you create an API client first, which is used to call the API. The new SDK will pick up your API key from either one of the GEMINI_API_KEY or GOOGLE_API_KEY environment variables, if you don't pass one to the client. export GEMINI_API_KEY = "YOUR_API_KEY" from google import genai client = genai . Client () # Set the API key using the GEMINI_API_KEY env var. # Alternatively, you could set the API key explicitly: # client = genai.Client(api_key="your_api_key") JavaScript import { GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({ apiKey : "GEMINI_API_KEY" }); Go Import the GenAI library: import "google.golang.org/genai" Create the client: client , err := genai . NewClient ( ctx , & genai . ClientConfig { Backend : genai . BackendGeminiAPI , }) Generate content Text Before Python Previously, there were no client objects, you accessed APIs directly through GenerativeModel objects. import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . generate_content ( 'Tell me a story in 300 words' ) print ( response . text ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( process . env . API_KEY ); const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" }); const prompt = "Tell me a story in 300 words" ; const result = await model . generateContent ( prompt ); console . log ( result . response . text ()); Go ctx := context . Background () client , err := genai . NewClient ( ctx , option . WithAPIKey ( "GOOGLE_API_KEY" )) if err != nil { log . Fatal ( err ) } defer client . Close () model := client . GenerativeModel ( "gemini-1.5-flash" ) resp , err := model . GenerateContent ( ctx , genai . Text ( "Tell me a story in 300 words." )) if err != nil { log . Fatal ( err ) } printResponse ( resp ) // utility for printing response parts After Python The new Google GenAI SDK provides access to all the API methods through the Client object. Except for a few stateful special cases ( chat and live-api session s), these are all stateless functions. For utility and uniformity, objects returned are pydantic classes. from google import genai client = genai . Client () response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = 'Tell me a story in 300 words.' ) print ( response . text ) print ( response . model_dump_json ( exclude_none = True , indent = 4 )) JavaScript import { GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const response = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : "Tell me a story in 300 words." , }); console . log ( response . text ); Go ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , err := client . Models . GenerateContent ( ctx , "gemini-2.0-flash" , genai . Text ( "Tell me a story in 300 words." ), nil ) if err != nil { log . Fatal ( err ) } debugPrint ( result ) // utility for printing result Image Before Python import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . generate_content ([ 'Tell me a story based on this image' , Image . open ( image_path ) ]) print ( response . text ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" }); function fileToGenerativePart ( path , mimeType ) { return { inlineData : { data : Buffer . from ( fs . readFileSync ( path )). toString ( "base64" ), mimeType , }, }; } const prompt = "Tell me a story based on this image" ; const imagePart = fileToGenerativePart ( `path/to/organ.jpg` , "image/jpeg" , ); const result = await model . generateContent ([ prompt , imagePart ]); console . log ( result . response . text ()); Go ctx := context . Background () client , err := genai . NewClient ( ctx , option . WithAPIKey ( "GOOGLE_API_KEY" )) if err != nil { log . Fatal ( err ) } defer client . Close () model := client . GenerativeModel ( "gemini-1.5-flash" ) imgData , err := os . ReadFile ( "path/to/organ.jpg" ) if err != nil { log . Fatal ( err ) } resp , err := model . GenerateContent ( ctx , genai . Text ( "Tell me about this instrument" ), genai . ImageData ( "jpeg" , imgData )) if err != nil { log . Fatal ( err ) } printResponse ( resp ) // utility for printing response After Python Many of the same convenience features exist in the new SDK. For example, PIL.Image objects are automatically converted. from google import genai from PIL import Image client = genai . Client () response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = [ 'Tell me a story based on this image' , Image . open ( image_path ) ] ) print ( response . text ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const organ = await ai . files . upload ({ file : "path/to/organ.jpg" , }); const response = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : [ createUserContent ([ "Tell me a story based on this image" , createPartFromUri ( organ . uri , organ . mimeType ) ]), ], }); console . log ( response . text ); Go ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imgData , err := os . ReadFile ( "path/to/organ.jpg" ) if err != nil { log . Fatal ( err ) } parts := [] * genai . Part { { Text : "Tell me a story based on this image" }, { InlineData : & genai . Blob { Data : imgData , MIMEType : "image/jpeg" }}, } contents := [] * genai . Content { { Parts : parts }, } result , err := client . Models . GenerateContent ( ctx , "gemini-2.0-flash" , contents , nil ) if err != nil { log . Fatal ( err ) } debugPrint ( result ) // utility for printing result Streaming Before Python import google.generativeai as genai response = model . generate_content ( "Write a cute story about cats." , stream = True ) for chunk in response : print ( chunk . text ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" }); const prompt = "Write a story about a magic backpack." ; const result = await model . generateContentStream ( prompt ); // Print text as it comes in. for await ( const chunk of result . stream ) { const chunkText = chunk . text (); process . stdout . write ( chunkText ); } Go ctx := context . Background () client , err := genai . NewClient ( ctx , option . WithAPIKey ( "GOOGLE_API_KEY" )) if err != nil { log . Fatal ( err ) } defer client . Close () model := client . GenerativeModel ( "gemini-1.5-flash" ) iter := model . GenerateContentStream ( ctx , genai . Text ( "Write a story about a magic backpack." )) for { resp , err := iter . Next () if err == iterator . Done { break } if err != nil { log . Fatal ( err ) } printResponse ( resp ) // utility for printing the response } After Python from google import genai client = genai . Client () for chunk in client . models . generate_content_stream ( model = 'gemini-2.0-flash' , contents = 'Tell me a story in 300 words.' ): print ( chunk . text ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const response = await ai . models . generateContentStream ({ model : "gemini-2.0-flash" , contents : "Write a story about a magic backpack." , }); let text = "" ; for await ( const chunk of response ) { console . log ( chunk . text ); text += chunk . text ; } Go ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } for result , err := range client . Models . GenerateContentStream ( ctx , "gemini-2.0-flash" , genai . Text ( "Write a story about a magic backpack." ), nil , ) { if err != nil { log . Fatal ( err ) } fmt . Print ( result . Candidates [ 0 ]. Content . Parts [ 0 ]. Text ) } Configuration Before Python import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' , system_instruction = 'you are a story teller for kids under 5 years old' , generation_config = genai . GenerationConfig ( max_output_tokens = 400 , top_k = 2 , top_p = 0.5 , temperature = 0.5 , response_mime_type = 'application/json' , stop_sequences = [ ' \n ' ], ) ) response = model . generate_content ( 'tell me a story in 100 words' ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" , generationConfig : { candidateCount : 1 , stopSequences : [ "x" ], maxOutputTokens : 20 , temperature : 1.0 , }, }); const result = await model . generateContent ( "Tell me a story about a magic backpack." , ); console . log ( result . response . text ()) Go ctx := context . Background () client , err := genai . NewClient ( ctx , option . WithAPIKey ( "GOOGLE_API_KEY" )) if err != nil { log . Fatal ( err ) } defer client . Close () model := client . GenerativeModel ( "gemini-1.5-flash" ) model . SetTemperature ( 0.5 ) model . SetTopP ( 0.5 ) model . SetTopK ( 2.0 ) model . SetMaxOutputTokens ( 100 ) model . ResponseMIMEType = "application/json" resp , err := model . GenerateContent ( ctx , genai . Text ( "Tell me about New York" )) if err != nil { log . Fatal ( err ) } printResponse ( resp ) // utility for printing response After Python For all methods in the new SDK, the required arguments are provided as keyword arguments. All optional inputs are provided in the config argument. Config arguments can be specified as either Python dictionaries or Config classes in the google.genai.types namespace. For utility and uniformity, all definitions within the types module are pydantic classes. from google import genai from google.genai import types client = genai . Client () response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = 'Tell me a story in 100 words.' , config = types . GenerateContentConfig ( system_instruction = 'you are a story teller for kids under 5 years old' , max_output_tokens = 400 , top_k = 2 , top_p = 0.5 , temperature = 0.5 , response_mime_type = 'application/json' , stop_sequences = [ ' \n ' ], seed = 42 , ), ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const response = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : "Tell me a story about a magic backpack." , config : { candidateCount : 1 , stopSequences : [ "x" ], maxOutputTokens : 20 , temperature : 1.0 , }, }); console . log ( response . text ); Go ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , err := client . Models . GenerateContent ( ctx , "gemini-2.0-flash" , genai . Text ( "Tell me about New York" ), & genai . GenerateContentConfig { Temperature : genai . Ptr [ float32 ]( 0.5 ), TopP : genai . Ptr [ float32 ]( 0.5 ), TopK : genai . Ptr [ float32 ]( 2.0 ), ResponseMIMEType : "application/json" , StopSequences : [] string { "Yankees" }, CandidateCount : 2 , Seed : genai . Ptr [ int32 ]( 42 ), MaxOutputTokens : 128 , PresencePenalty : genai . Ptr [ float32 ]( 0.5 ), FrequencyPenalty : genai . Ptr [ float32 ]( 0.5 ), }, ) if err != nil { log . Fatal ( err ) } debugPrint ( result ) // utility for printing response Safety settings Generate a response with safety settings: Before Python import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . generate_content ( 'say something bad' , safety_settings = { 'HATE' : 'BLOCK_ONLY_HIGH' , 'HARASSMENT' : 'BLOCK_ONLY_HIGH' , } ) JavaScript import { GoogleGenerativeAI , HarmCategory , HarmBlockThreshold } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" , safetySettings : [ { category : HarmCategory . HARM_CATEGORY_HARASSMENT , threshold : HarmBlockThreshold . BLOCK_LOW_AND_ABOVE , }, ], }); const unsafePrompt = "I support Martians Soccer Club and I think " + "Jupiterians Football Club sucks! Write an ironic phrase telling " + "them how I feel about them." ; const result = await model . generateContent ( unsafePrompt ); try { result . response . text (); } catch ( e ) { console . error ( e ); console . log ( result . response . candidates [ 0 ]. safetyRatings ); } After Python from google import genai from google.genai import types client = genai . Client () response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = 'say something bad' , config = types . GenerateContentConfig ( safety_settings = [ types . SafetySetting ( category = 'HARM_CATEGORY_HATE_SPEECH' , threshold = 'BLOCK_ONLY_HIGH' ), ] ), ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const unsafePrompt = "I support Martians Soccer Club and I think " + "Jupiterians Football Club sucks! Write an ironic phrase telling " + "them how I feel about them." ; const response = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : unsafePrompt , config : { safetySettings : [ { category : "HARM_CATEGORY_HARASSMENT" , threshold : "BLOCK_ONLY_HIGH" , }, ], }, }); console . log ( "Finish reason:" , response . candidates [ 0 ]. finishReason ); console . log ( "Safety ratings:" , response . candidates [ 0 ]. safetyRatings ); Async Before Python import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . generate_content_async ( 'tell me a story in 100 words' ) After Python To use the new SDK with asyncio , there is a separate async implementation of every method under client.aio . from google import genai client = genai . Client () response = await client . aio . models . generate_content ( model = 'gemini-2.0-flash' , contents = 'Tell me a story in 300 words.' ) Chat Start a chat and send a message to the model: Before Python import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' ) chat = model . start_chat () response = chat . send_message ( "Tell me a story in 100 words" ) response = chat . send_message ( "What happened after that?" ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" }); const chat = model . startChat ({ history : [ { role : "user" , parts : [{ text : "Hello" }], }, { role : "model" , parts : [{ text : "Great to meet you. What would you like to know?" }], }, ], }); let result = await chat . sendMessage ( "I have 2 dogs in my house." ); console . log ( result . response . text ()); result = await chat . sendMessage ( "How many paws are in my house?" ); console . log ( result . response . text ()); Go ctx := context . Background () client , err := genai . NewClient ( ctx , option . WithAPIKey ( "GOOGLE_API_KEY" )) if err != nil { log . Fatal ( err ) } defer client . Close () model := client . GenerativeModel ( "gemini-1.5-flash" ) cs := model . StartChat () cs . History = [] * genai . Content { { Parts : [] genai . Part { genai . Text ( "Hello, I have 2 dogs in my house." ), }, Role : "user" , }, { Parts : [] genai . Part { genai . Text ( "Great to meet you. What would you like to know?" ), }, Role : "model" , }, } res , err := cs . SendMessage ( ctx , genai . Text ( "How many paws are in my house?" )) if err != nil { log . Fatal ( err ) } printResponse ( res ) // utility for printing the response After Python from google import genai client = genai . Client () chat = client . chats . create ( model = 'gemini-2.0-flash' ) response = chat . send_message ( message = 'Tell me a story in 100 words' ) response = chat . send_message ( message = 'What happened after that?' ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const chat = ai . chats . create ({ model : "gemini-2.0-flash" , history : [ { role : "user" , parts : [{ text : "Hello" }], }, { role : "model" , parts : [{ text : "Great to meet you. What would you like to know?" }], }, ], }); const response1 = await chat . sendMessage ({ message : "I have 2 dogs in my house." , }); console . log ( "Chat response 1:" , response1 . text ); const response2 = await chat . sendMessage ({ message : "How many paws are in my house?" , }); console . log ( "Chat response 2:" , response2 . text ); Go ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } chat , err := client . Chats . Create ( ctx , "gemini-2.0-flash" , nil , nil ) if err != nil { log . Fatal ( err ) } result , err := chat . SendMessage ( ctx , genai . Part { Text : "Hello, I have 2 dogs in my house." }) if err != nil { log . Fatal ( err ) } debugPrint ( result ) // utility for printing result result , err = chat . SendMessage ( ctx , genai . Part { Text : "How many paws are in my house?" }) if err != nil { log . Fatal ( err ) } debugPrint ( result ) // utility for printing result Function calling Before Python import google.generativeai as genai from enum import Enum def get_current_weather ( location : str ) - > str : """Get the current whether in a given location. Args: location: required, The city and state, e.g. San Franciso, CA unit: celsius or fahrenheit """ print ( f 'Called with: { location =} ' ) return "23C" model = genai . GenerativeModel ( model_name = "gemini-1.5-flash" , tools = [ get_current_weather ] ) response = model . generate_content ( "What is the weather in San Francisco?" ) function_call = response . candidates [ 0 ] . parts [ 0 ] . function_call After Python In the new SDK, automatic function calling is the default. Here, you disable it. from google import genai from google.genai import types client = genai . Client () def get_current_weather ( location : str ) - > str : """Get the current whether in a given location. Args: location: required, The city and state, e.g. San Franciso, CA unit: celsius or fahrenheit """ print ( f 'Called with: { location =} ' ) return "23C" response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = "What is the weather like in Boston?" , config = types . GenerateContentConfig ( tools = [ get_current_weather ], automatic_function_calling = { 'disable' : True }, ), ) function_call = response . candidates [ 0 ] . content . parts [ 0 ] . function_call Automatic function calling Before Python The old SDK only supports automatic function calling in chat. In the new SDK this is the default behavior in generate_content . import google.generativeai as genai def get_current_weather ( city : str ) - > str : return "23C" model = genai . GenerativeModel ( model_name = "gemini-1.5-flash" , tools = [ get_current_weather ] ) chat = model . start_chat ( enable_automatic_function_calling = True ) result = chat . send_message ( "What is the weather in San Francisco?" ) After Python from google import genai from google.genai import types client = genai . Client () def get_current_weather ( city : str ) - > str : return "23C" response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = "What is the weather like in Boston?" , config = types . GenerateContentConfig ( tools = [ get_current_weather ] ), ) Code execution Code execution is a tool that allows the model to generate Python code, run it, and return the result. Before Python import google.generativeai as genai model = genai . GenerativeModel ( model_name = "gemini-1.5-flash" , tools = "code_execution" ) result = model . generate_content ( "What is the sum of the first 50 prime numbers? Generate and run code for " "the calculation, and make sure you get all 50." ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" , tools : [{ codeExecution : {} }], }); const result = await model . generateContent ( "What is the sum of the first 50 prime numbers? " + "Generate and run code for the calculation, and make sure you get " + "all 50." , ); console . log ( result . response . text ()); After Python from google import genai from google.genai import types client = genai . Client () response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = 'What is the sum of the first 50 prime numbers? Generate and run ' 'code for the calculation, and make sure you get all 50.' , config = types . GenerateContentConfig ( tools = [ types . Tool ( code_execution = types . ToolCodeExecution )], ), ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const response = await ai . models . generateContent ({ model : "gemini-2.0-pro-exp-02-05" , contents : `Write and execute code that calculates the sum of the first 50 prime numbers. Ensure that only the executable code and its resulting output are generated.` , }); // Each part may contain text, executable code, or an execution result. for ( const part of response . candidates [ 0 ]. content . parts ) { console . log ( part ); console . log ( "\n" ); } console . log ( "-" . repeat ( 80 )); // The `.text` accessor concatenates the parts into a markdown-formatted text. console . log ( "\n" , response . text ); Search grounding GoogleSearch (Gemini>=2.0) and GoogleSearchRetrieval (Gemini < 2.0) are tools that allow the model to retrieve public web data for grounding, powered by Google. Before Python import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . generate_content ( contents = "what is the Google stock price?" , tools = 'google_search_retrieval' ) After Python from google import genai from google.genai import types client = genai . Client () response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = 'What is the Google stock price?' , config = types . GenerateContentConfig ( tools = [ types . Tool ( google_search = types . GoogleSearch () ) ] ) ) JSON response Generate answers in JSON format. Before Python By specifying a response_schema and setting response_mime_type="application/json" users can constrain the model to produce a JSON response following a given structure. import google.generativeai as genai import typing_extensions as typing class CountryInfo ( typing . TypedDict ): name : str population : int capital : str continent : str major_cities : list [ str ] gdp : int official_language : str total_area_sq_mi : int model = genai . GenerativeModel ( model_name = "gemini-1.5-flash" ) result = model . generate_content ( "Give me information of the United States" , generation_config = genai . GenerationConfig ( response_mime_type = "application/json" , response_schema = CountryInfo ), ) JavaScript import { GoogleGenerativeAI , SchemaType } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const schema = { description : "List of recipes" , type : SchemaType . ARRAY , items : { type : SchemaType . OBJECT , properties : { recipeName : { type : SchemaType . STRING , description : "Name of the recipe" , nullable : false , }, }, required : [ "recipeName" ], }, }; const model = genAI . getGenerativeModel ({ model : "gemini-1.5-pro" , generationConfig : { responseMimeType : "application/json" , responseSchema : schema , }, }); const result = await model . generateContent ( "List a few popular cookie recipes." , ); console . log ( result . response . text ()); After Python The new SDK uses pydantic classes to provide the schema (although you can pass a genai.types.Schema , or equivalent dict ). When possible, the SDK will parse the returned JSON, and return the result in response.parsed . If you provided a pydantic class as the schema the SDK will convert that JSON to an instance of the class. from google import genai from pydantic import BaseModel client = genai . Client () class CountryInfo ( BaseModel ): name : str population : int capital : str continent : str major_cities : list [ str ] gdp : int official_language : str total_area_sq_mi : int response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = 'Give me information of the United States.' , config = { 'response_mime_type' : 'application/json' , 'response_schema' : CountryInfo , }, ) response . parsed JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const response = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : "List a few popular cookie recipes." , config : { responseMimeType : "application/json" , responseSchema : { type : "array" , items : { type : "object" , properties : { recipeName : { type : "string" }, ingredients : { type : "array" , items : { type : "string" } }, }, required : [ "recipeName" , "ingredients" ], }, }, }, }); console . log ( response . text ); Files Upload Upload a file: Before Python import requests import pathlib import google.generativeai as genai # Download file response = requests . get ( 'https://storage.googleapis.com/generativeai-downloads/data/a11.txt' ) pathlib . Path ( 'a11.txt' ) . write_text ( response . text ) file = genai . upload_file ( path = 'a11.txt' ) model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . generate_content ([ 'Can you summarize this file:' , my_file ]) print ( response . text ) After Python import requests import pathlib from google import genai client = genai . Client () # Download file response = requests . get ( 'https://storage.googleapis.com/generativeai-downloads/data/a11.txt' ) pathlib . Path ( 'a11.txt' ) . write_text ( response . text ) my_file = client . files . upload ( file = 'a11.txt' ) response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = [ 'Can you summarize this file:' , my_file ] ) print ( response . text ) List and get List uploaded files and get an uploaded file with a filename: Before Python import google.generativeai as genai for file in genai . list_files (): print ( file . name ) file = genai . get_file ( name = file . name ) After Python from google import genai client = genai . Client () for file in client . files . list (): print ( file . name ) file = client . files . get ( name = file . name ) Delete Delete a file: Before Python import pathlib import google.generativeai as genai pathlib . Path ( 'dummy.txt' ) . write_text ( dummy ) dummy_file = genai . upload_file ( path = 'dummy.txt' ) file = genai . delete_file ( name = dummy_file . name ) After Python import pathlib from google import genai client = genai . Client () pathlib . Path ( 'dummy.txt' ) . write_text ( dummy ) dummy_file = client . files . upload ( file = 'dummy.txt' ) response = client . files . delete ( name = dummy_file . name ) Context caching Context caching allows the user to pass the content to the model once, cache the input tokens, and then refer to the cached tokens in subsequent calls to lower the cost. Before Python import requests import pathlib import google.generativeai as genai from google.generativeai import caching # Download file response = requests . get ( 'https://storage.googleapis.com/generativeai-downloads/data/a11.txt' ) pathlib . Path ( 'a11.txt' ) . write_text ( response . text ) # Upload file document = genai . upload_file ( path = "a11.txt" ) # Create cache apollo_cache = caching . CachedContent . create ( model = "gemini-1.5-flash-001" , system_instruction = "You are an expert at analyzing transcripts." , contents = [ document ], ) # Generate response apollo_model = genai . GenerativeModel . from_cached_content ( cached_content = apollo_cache ) response = apollo_model . generate_content ( "Find a lighthearted moment from this transcript" ) JavaScript import { GoogleAICacheManager , GoogleAIFileManager } from "@google/generative-ai/server" ; import { GoogleGenerativeAI } from "@google/generative-ai" ; const cacheManager = new GoogleAICacheManager ( "GOOGLE_API_KEY" ); const fileManager = new GoogleAIFileManager ( "GOOGLE_API_KEY" ); const uploadResult = await fileManager . uploadFile ( "path/to/a11.txt" , { mimeType : "text/plain" , }); const cacheResult = await cacheManager . create ({ model : "models/gemini-1.5-flash" , contents : [ { role : "user" , parts : [ { fileData : { fileUri : uploadResult . file . uri , mimeType : uploadResult . file . mimeType , }, }, ], }, ], }); console . log ( cacheResult ); const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModelFromCachedContent ( cacheResult ); const result = await model . generateContent ( "Please summarize this transcript." , ); console . log ( result . response . text ()); After Python import requests import pathlib from google import genai from google.genai import types client = genai . Client () # Check which models support caching. for m in client . models . list (): for action in m . supported_actions : if action == "createCachedContent" : print ( m . name ) break # Download file response = requests . get ( 'https://storage.googleapis.com/generativeai-downloads/data/a11.txt' ) pathlib . Path ( 'a11.txt' ) . write_text ( response . text ) # Upload file document = client . files . upload ( file = 'a11.txt' ) # Create cache model = 'gemini-1.5-flash-001' apollo_cache = client . caches . create ( model = model , config = { 'contents' : [ document ], 'system_instruction' : 'You are an expert at analyzing transcripts.' , }, ) # Generate response response = client . models . generate_content ( model = model , contents = 'Find a lighthearted moment from this transcript' , config = types . GenerateContentConfig ( cached_content = apollo_cache . name , ) ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const filePath = path . join ( media , "a11.txt" ); const document = await ai . files . upload ({ file : filePath , config : { mimeType : "text/plain" }, }); console . log ( "Uploaded file name:" , document . name ); const modelName = "gemini-1.5-flash" ; const contents = [ createUserContent ( createPartFromUri ( document . uri , document . mimeType )), ]; const cache = await ai . caches . create ({ model : modelName , config : { contents : contents , systemInstruction : "You are an expert analyzing transcripts." , }, }); console . log ( "Cache created:" , cache ); const response = await ai . models . generateContent ({ model : modelName , contents : "Please summarize this transcript" , config : { cachedContent : cache . name }, }); console . log ( "Response text:" , response . text ); Count tokens Count the number of tokens in a request. Before Python import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . count_tokens ( 'The quick brown fox jumps over the lazy dog.' ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY+); const model = genAI.getGenerativeModel({ model: " gemini - 1.5 - flash ", }); // Count tokens in a prompt without calling text generation. const countResult = await model.countTokens( " The quick brown fox jumps over the lazy dog . ", ); console.log(countResult.totalTokens); // 11 const generateResult = await model.generateContent( " The quick brown fox jumps over the lazy dog . " , ); // On the response for `generateContent`, use `usageMetadata` // to get separate input and output token counts // (`promptTokenCount` and `candidatesTokenCount`, respectively), // as well as the combined token count (`totalTokenCount`). console . log ( generateResult . response . usageMetadata ); // candidatesTokenCount and totalTokenCount depend on response, may vary // { promptTokenCount: 11, candidatesTokenCount: 124, totalTokenCount: 135 } After Python from google import genai client = genai . Client () response = client . models . count_tokens ( model = 'gemini-2.0-flash' , contents = 'The quick brown fox jumps over the lazy dog.' , ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const prompt = "The quick brown fox jumps over the lazy dog." ; const countTokensResponse = await ai . models . countTokens ({ model : "gemini-2.0-flash" , contents : prompt , }); console . log ( countTokensResponse . totalTokens ); const generateResponse = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : prompt , }); console . log ( generateResponse . usageMetadata ); Generate images Generate images: Before Python #pip install https://github.com/google-gemini/generative-ai-python@imagen import google.generativeai as genai imagen = genai . ImageGenerationModel ( "imagen-3.0-generate-001" ) gen_images = imagen . generate_images ( prompt = "Robot holding a red skateboard" , number_of_images = 1 , safety_filter_level = "block_low_and_above" , person_generation = "allow_adult" , aspect_ratio = "3:4" , ) After Python from google import genai client = genai . Client () gen_images = client . models . generate_images ( model = 'imagen-3.0-generate-001' , prompt = 'Robot holding a red skateboard' , config = types . GenerateImagesConfig ( number_of_images = 1 , safety_filter_level = "BLOCK_LOW_AND_ABOVE" , person_generation = "ALLOW_ADULT" , aspect_ratio = "3:4" , ) ) for n , image in enumerate ( gen_images . generated_images ): pathlib . Path ( f ' { n } .png' ) . write_bytes ( image . image . image_bytes ) Embed content Generate content embeddings. Before Python import google.generativeai as genai response = genai . embed_content ( model = 'models/text-embedding-004' , content = 'Hello world' ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModel ({ model : "text-embedding-004" , }); const result = await model . embedContent ( "Hello world!" ); console . log ( result . embedding ); After Python from google import genai client = genai . Client () response = client . models . embed_content ( model = 'text-embedding-004' , contents = 'Hello world' , ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const text = "Hello World!" ; const result = await ai . models . embedContent ({ model : "text-embedding-004" , contents : text , config : { outputDimensionality : 10 }, }); console . log ( result . embeddings ); Tune a Model Create and use a tuned model. The new SDK simplifies tuning with client.tunings.tune , which launches the tuning job and polls until the job is complete. Before Python import google.generativeai as genai import random # create tuning model train_data = {} for i in range ( 1 , 6 ): key = f 'input { i } ' value = f 'output { i } ' train_data [ key ] = value name = f 'generate-num- { random . randint ( 0 , 10000 ) } ' operation = genai . create_tuned_model ( source_model = 'models/gemini-1.5-flash-001-tuning' , training_data = train_data , id = name , epoch_count = 5 , batch_size = 4 , learning_rate = 0.001 , ) # wait for tuning complete tuningProgress = operation . result () # generate content with the tuned model model = genai . GenerativeModel ( model_name = f 'tunedModels/ { name } ' ) response = model . generate_content ( '55' ) After Python from google import genai from google.genai import types client = genai . Client () # Check which models are available for tuning. for m in client . models . list (): for action in m . supported_actions : if action == "createTunedModel" : print ( m . name ) break # create tuning model training_dataset = types . TuningDataset ( examples = [ types . TuningExample ( text_input = f 'input { i } ' , output = f 'output { i } ' , ) for i in range ( 5 ) ], ) tuning_job = client . tunings . tune ( base_model = 'models/gemini-1.5-flash-001-tuning' , training_dataset = training_dataset , config = types . CreateTuningJobConfig ( epoch_count = 5 , batch_size = 4 , learning_rate = 0.001 , tuned_model_display_name = "test tuned model" ) ) # generate content with the tuned model response = client . models . generate_content ( model = tuning_job . tuned_model . model , contents = '55' , ) Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-09 UTC.
|
text_content/docs_model-tuning_dfd6a211.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/model-tuning#main-content
|
| 2 |
+
Title: Fine-tuning with the Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Fine-tuning with the Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Fine-tuning with the Gemini API With the deprecation of Gemini 1.5 Flash-001 in May 2025, we no longer have a model available which supports fine-tuning in the Gemini API, but it is supported in Vertex AI . We plan to bring fine-tuning support back in the future. We would love to hear from you on our developer forum if fine-tuning is important to your use case. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-30 UTC.
|
text_content/docs_models_08e6e52a.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/models#gemini-2.0-flash-lite
|
| 2 |
+
Title: Gemini models | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Gemini models | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Gemini models 2.5 Pro spark Our most powerful thinking model with maximum response accuracy and state-of-the-art performance Input audio, images, video, and text, get text responses Tackle difficult problems, analyze large databases, and more Best for complex coding, reasoning, and multimodal understanding 2.5 Flash spark Our best model in terms of price-performance, offering well-rounded capabilities. Input audio, images, video, and text, and get text responses Model thinks as needed; or, you can configure a thinking budget Best for low latency, high volume tasks that require thinking 2.5 Flash-Lite experiment A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Input audio, images, video, and text, and get text responses Most cost-efficient model supporting high throughput Best for real time, low latency use cases Note: Gemini 2.5 Pro and 2.5 Flash come with thinking on by default . If you're migrating from a non-thinking model such as 2.0 Pro or Flash, we recommend you to review the Thinking guide first. Model variants The Gemini API offers different models that are optimized for specific use cases. Here's a brief overview of Gemini variants that are available: Model variant Input(s) Output Optimized for Gemini 2.5 Pro gemini-2.5-pro Audio, images, videos, text, and PDF Text Enhanced thinking and reasoning, multimodal understanding, advanced coding, and more Gemini 2.5 Flash gemini-2.5-flash Audio, images, videos, and text Text Adaptive thinking, cost efficiency Gemini 2.5 Flash-Lite Preview gemini-2.5-flash-lite-preview-06-17 Text, image, video, audio Text Most cost-efficient model supporting high throughput Gemini 2.5 Flash Native Audio gemini-2.5-flash-preview-native-audio-dialog & gemini-2.5-flash-exp-native-audio-thinking-dialog Audio, videos, and text Text and audio, interleaved High quality, natural conversational audio outputs, with or without thinking Gemini 2.5 Flash Preview TTS gemini-2.5-flash-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.5 Pro Preview TTS gemini-2.5-pro-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.0 Flash gemini-2.0-flash Audio, images, videos, and text Text Next generation features, speed, and realtime streaming. Gemini 2.0 Flash Preview Image Generation gemini-2.0-flash-preview-image-generation Audio, images, videos, and text Text, images Conversational image generation and editing Gemini 2.0 Flash-Lite gemini-2.0-flash-lite Audio, images, videos, and text Text Cost efficiency and low latency Gemini 1.5 Flash gemini-1.5-flash Audio, images, videos, and text Text Fast and versatile performance across a diverse variety of tasks Gemini 1.5 Flash-8B gemini-1.5-flash-8b Audio, images, videos, and text Text High volume and lower intelligence tasks Gemini 1.5 Pro gemini-1.5-pro Audio, images, videos, and text Text Complex reasoning tasks requiring more intelligence Gemini Embedding gemini-embedding-exp Text Text embeddings Measuring the relatedness of text strings Imagen 4 imagen-4.0-generate-preview-06-06 imagen-4.0-ultra-generate-preview-06-06 Text Images Our most up-to-date image generation model Imagen 3 imagen-3.0-generate-002 Text Images High quality image generation model Veo 2 veo-2.0-generate-001 Text, images Video High quality video generation Gemini 2.5 Flash Live gemini-live-2.5-flash-preview Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions Gemini 2.0 Flash Live gemini-2.0-flash-live-001 Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions You can view the rate limits for each model on the rate limits page . Gemini 2.5 Pro Gemini 2.5 Pro is our state-of-the-art thinking model, capable of reasoning over complex problems in code, math, and STEM, as well as analyzing large datasets, codebases, and documents using long context. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-pro save Supported data types Inputs Audio, images, video, text, and PDF Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported Batch API Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-pro Preview: gemini-2.5-pro-preview-06-05 Preview: gemini-2.5-pro-preview-05-06 Preview: gemini-2.5-pro-preview-03-25 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Our best model in terms of price-performance, offering well-rounded capabilities. 2.5 Flash is best for large scale processing, low-latency, high volume tasks that require thinking, and agentic use cases. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash save Supported data types Inputs Text, images, video, audio Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Audio generation Not supported Caching Supported Code execution Supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Supported Thinking Supported Tuning Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-flash Preview: gemini-2.5-flash-preview-05-20 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash-Lite Preview A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-lite-preview-06-17 save Supported data types Inputs Text, images, video, and audio Output Text token_auto Token limits [*] Input token limit 1,000,000 Output token limit 64,000 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported URL Context Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-lite-preview-06-17 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Native Audio Our native audio dialog models, with and without thinking, available through the Live API . These models provide interactive and unstructured conversational experiences, with style and control prompting. Try native audio in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-preview-native-audio-dialog & models/gemini-2.5-flash-exp-native-audio-thinking-dialog save Supported data types Inputs Audio, video, text Output Audio and text token_auto Token limits [*] Input token limit 128,000 Output token limit 8,000 handyman Capabilities Audio generation Supported Caching Not supported Code execution Not supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Not supported Thinking Supported Tuning Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-preview-05-20 Experimental: gemini-2.5-flash-exp-native-audio-thinking-dialog calendar_month Latest update May 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Preview Text-to-Speech Gemini 2.5 Flash Preview TTS is our price-performant text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Flash rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported 123 Versions Read the model version patterns for more details. gemini-2.5-flash-preview-tts calendar_month Latest update May 2025 Gemini 2.5 Pro Preview Text-to-Speech Gemini 2.5 Pro Preview TTS is our most powerful text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Pro rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-pro-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported 123 Versions Read the model version patterns for more details. gemini-2.5-pro-preview-tts calendar_month Latest update May 2025 Gemini 2.0 Flash Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, native tool use, and a 1M token context window. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Not supported Live API Supported Thinking Experimental Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash Stable: gemini-2.0-flash-001 Experimental: gemini-2.0-flash-exp calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash Preview Image Generation Gemini 2.0 Flash Preview Image Generation delivers improved image generation features, including generating and editing images conversationally. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-preview-image-generation save Supported data types Inputs Audio, images, video, and text Output Text and images token_auto Token limits [*] Input token limit 32,000 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Not supported Code execution Not Supported Search Not Supported Image generation Supported Audio generation Not supported Live API Not Supported Thinking Not Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-preview-image-generation gemini-2.0-flash-preview-image-generation is not currently supported in a number of countries in Europe, Middle East & Africa calendar_month Latest update May 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash-Lite A Gemini 2.0 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-lite save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Not supported Search Not supported Image generation Not supported Audio generation Not supported Live API Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash-lite Stable: gemini-2.0-flash-lite-001 calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 1.5 Flash Gemini 1.5 Flash is a fast and versatile multimodal model for scaling across diverse tasks. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-latest Latest stable: gemini-1.5-flash Stable: gemini-1.5-flash-001 gemini-1.5-flash-002 calendar_month Latest update September 2024 Gemini 1.5 Flash-8B Gemini 1.5 Flash-8B is a small model designed for lower intelligence tasks. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-flash-8b save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-8b-latest Latest stable: gemini-1.5-flash-8b Stable: gemini-1.5-flash-8b-001 calendar_month Latest update October 2024 Gemini 1.5 Pro Try Gemini 2.5 Pro Preview , our most advanced Gemini model to date. Gemini 1.5 Pro is a mid-size multimodal model that is optimized for a wide-range of reasoning tasks. 1.5 Pro can process large amounts of data at once, including 2 hours of video, 19 hours of audio, codebases with 60,000 lines of code, or 2,000 pages of text. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-pro save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 2,097,152 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 7,200 Maximum video length 2 hours Maximum audio length Approximately 19 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-pro-latest Latest stable: gemini-1.5-pro Stable: gemini-1.5-pro-001 gemini-1.5-pro-002 calendar_month Latest update September 2024 Imagen 4 Imagen 4 is our latest image model, capable of generating highly detailed images with rich lighting, significantly better text rendering, and higher resolution output than previous models. Model details Property Description id_card Model code Gemini API imagen-4.0-generate-preview-06-06 imagen-4.0-ultra-generate-preview-06-06 save Supported data types Input Text Output Images token_auto Token limits [*] Input token limit 480 tokens (text) Output images 1 (Ultra) 1 to 4 (Standard) calendar_month Latest update June 2025 Imagen 3 Imagen 3 is our highest quality text-to-image model, capable of generating images with even better detail, richer lighting and fewer distracting artifacts than our previous models. Model details Property Description id_card Model code Gemini API imagen-3.0-generate-002 save Supported data types Input Text Output Images token_auto Token limits [*] Input token limit N/A Output images Up to 4 calendar_month Latest update February 2025 Veo 2 Veo 2 is our high quality text- and image-to-video model, capable of generating detailed videos, capturing the artistic nuance in your prompts. Model details Property Description id_card Model code Gemini API veo-2.0-generate-001 save Supported data types Input Text, image Output Video token_auto Limits Text input N/A Image input Any image resolution and aspect ratio up to 20MB file size Output video Up to 2 calendar_month Latest update April 2025 Gemini 2.5 Flash Live The Gemini 2.5 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-live-2.5-flash-preview save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-live-2.5-flash-preview calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.0 Flash Live The Gemini 2.0 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-live-001 save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-live-001 calendar_month Latest update April 2025 cognition_2 Knowledge cutoff August 2024 Gemini Embedding Experimental Gemini embedding achieves a SOTA performance across many key dimensions including code, multi-lingual, and retrieval. Gemini Embedding rate limits are more restricted since it is an experimental model. Model details Property Description id_card Model code Gemini API gemini-embedding-exp-03-07 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 8,192 Output dimension size Elastic, supports: 3072, 1536, or 768 calendar_month Latest update March 2025 Text Embedding and Embedding Text Embedding Try our new experimental Gemini embedding model which achieves state-of-the-art performance. Text embeddings are used to measure the relatedness of strings and are widely used in many AI applications. text-embedding-004 achieves a stronger retrieval performance and outperforms existing models with comparable dimensions, on the standard MTEB embedding benchmarks. Model details Property Description id_card Model code Gemini API models/text-embedding-004 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 2,048 Output dimension size 768 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Not supported calendar_month Latest update April 2024 Embedding Note: Text Embedding is the newer version of the Embedding model. If you're creating a new project, use Text Embedding. You can use the Embedding model to generate text embeddings for input text. The Embedding model is optimized for creating embeddings with 768 dimensions for text of up to 2,048 tokens. Embedding model details Property Description id_card Model code models/embedding-001 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 2,048 Output dimension size 768 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Not supported calendar_month Latest update December 2023 AQA You can use the AQA model to perform Attributed Question-Answering (AQA)–related tasks over a document, corpus, or a set of passages. The AQA model returns answers to questions that are grounded in provided sources, along with estimating answerable probability. Model details Property Description id_card Model code models/aqa save Supported data types Input Text Output Text language Supported language English token_auto Token limits [*] Input token limit 7,168 Output token limit 1,024 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Supported calendar_month Latest update December 2023 See the examples to explore the capabilities of these model variations. [*] A token is equivalent to about 4 characters for Gemini models. 100 tokens are about 60-80 English words. Model version name patterns Gemini models are available in either stable , preview , or experimental versions. In your code, you can use one of the following model name formats to specify which model and version you want to use. Latest stable Points to the most recent stable version released for the specified model generation and variation. To specify the latest stable version, use the following pattern: <model>-<generation>-<variation> . For example, gemini-2.0-flash . Stable Points to a specific stable model. Stable models usually don't change. Most production apps should use a specific stable model. To specify a stable version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-flash-001 . Preview Points to a preview model which may not be suitable for production use, come with more restrictive rate limits, but may have billing enabled. To specify a preview version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.5-pro-preview-06-05 . Experimental Points to an experimental model which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback and get our latest updates into the hands of developers quickly. To specify an experimental version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-pro-exp-02-05 . Experimental models In addition to stable models, the Gemini API offers experimental models which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback, get our latest updates into the hands of developers quickly, and highlight the pace of innovation happening at Google. What we learn from experimental launches informs how we release models more widely. An experimental model can be swapped for another without prior notice. We don't guarantee that an experimental model will become a stable model in the future. Previous experimental models As new versions or stable releases become available, we remove and replace experimental models. You can find the previous experimental models we released in the following section along with the replacement version: Model code Base model Replacement version gemini-2.5-flash-preview-04-17 Gemini 2.5 Flash gemini-2.5-flash-preview-05-20 gemini-2.0-flash-exp-image-generation Gemini 2.0 Flash gemini-2.0-flash-preview-image-generation gemini-2.5-pro-preview-06-05 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-05-06 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-03-25 Gemini 2.5 Pro gemini-2.5-pro gemini-2.0-flash-thinking-exp-01-21 Gemini 2.5 Flash gemini-2.5-flash-preview-04-17 gemini-2.0-pro-exp-02-05 Gemini 2.0 Pro Experimental gemini-2.5-pro-preview-03-25 gemini-2.0-flash-exp Gemini 2.0 Flash gemini-2.0-flash gemini-exp-1206 Gemini 2.0 Pro gemini-2.0-pro-exp-02-05 gemini-2.0-flash-thinking-exp-1219 Gemini 2.0 Flash Thinking gemini-2.0-flash-thinking-exp-01-21 gemini-exp-1121 Gemini gemini-exp-1206 gemini-exp-1114 Gemini gemini-exp-1206 gemini-1.5-pro-exp-0827 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-pro-exp-0801 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-flash-8b-exp-0924 Gemini 1.5 Flash-8B gemini-1.5-flash-8b gemini-1.5-flash-8b-exp-0827 Gemini 1.5 Flash-8B gemini-1.5-flash-8b Supported languages Gemini models are trained to work with the following languages: Arabic ( ar ) Bengali ( bn ) Bulgarian ( bg ) Chinese simplified and traditional ( zh ) Croatian ( hr ) Czech ( cs ) Danish ( da ) Dutch ( nl ) English ( en ) Estonian ( et ) Finnish ( fi ) French ( fr ) German ( de ) Greek ( el ) Hebrew ( iw ) Hindi ( hi ) Hungarian ( hu ) Indonesian ( id ) Italian ( it ) Japanese ( ja ) Korean ( ko ) Latvian ( lv ) Lithuanian ( lt ) Norwegian ( no ) Polish ( pl ) Portuguese ( pt ) Romanian ( ro ) Russian ( ru ) Serbian ( sr ) Slovak ( sk ) Slovenian ( sl ) Spanish ( es ) Swahili ( sw ) Swedish ( sv ) Thai ( th ) Turkish ( tr ) Ukrainian ( uk ) Vietnamese ( vi ) Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-07 UTC.
|
text_content/docs_models_1ac37306.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/models#veo-2
|
| 2 |
+
Title: Gemini models | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Gemini models | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Gemini models 2.5 Pro spark Our most powerful thinking model with maximum response accuracy and state-of-the-art performance Input audio, images, video, and text, get text responses Tackle difficult problems, analyze large databases, and more Best for complex coding, reasoning, and multimodal understanding 2.5 Flash spark Our best model in terms of price-performance, offering well-rounded capabilities. Input audio, images, video, and text, and get text responses Model thinks as needed; or, you can configure a thinking budget Best for low latency, high volume tasks that require thinking 2.5 Flash-Lite experiment A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Input audio, images, video, and text, and get text responses Most cost-efficient model supporting high throughput Best for real time, low latency use cases Note: Gemini 2.5 Pro and 2.5 Flash come with thinking on by default . If you're migrating from a non-thinking model such as 2.0 Pro or Flash, we recommend you to review the Thinking guide first. Model variants The Gemini API offers different models that are optimized for specific use cases. Here's a brief overview of Gemini variants that are available: Model variant Input(s) Output Optimized for Gemini 2.5 Pro gemini-2.5-pro Audio, images, videos, text, and PDF Text Enhanced thinking and reasoning, multimodal understanding, advanced coding, and more Gemini 2.5 Flash gemini-2.5-flash Audio, images, videos, and text Text Adaptive thinking, cost efficiency Gemini 2.5 Flash-Lite Preview gemini-2.5-flash-lite-preview-06-17 Text, image, video, audio Text Most cost-efficient model supporting high throughput Gemini 2.5 Flash Native Audio gemini-2.5-flash-preview-native-audio-dialog & gemini-2.5-flash-exp-native-audio-thinking-dialog Audio, videos, and text Text and audio, interleaved High quality, natural conversational audio outputs, with or without thinking Gemini 2.5 Flash Preview TTS gemini-2.5-flash-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.5 Pro Preview TTS gemini-2.5-pro-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.0 Flash gemini-2.0-flash Audio, images, videos, and text Text Next generation features, speed, and realtime streaming. Gemini 2.0 Flash Preview Image Generation gemini-2.0-flash-preview-image-generation Audio, images, videos, and text Text, images Conversational image generation and editing Gemini 2.0 Flash-Lite gemini-2.0-flash-lite Audio, images, videos, and text Text Cost efficiency and low latency Gemini 1.5 Flash gemini-1.5-flash Audio, images, videos, and text Text Fast and versatile performance across a diverse variety of tasks Gemini 1.5 Flash-8B gemini-1.5-flash-8b Audio, images, videos, and text Text High volume and lower intelligence tasks Gemini 1.5 Pro gemini-1.5-pro Audio, images, videos, and text Text Complex reasoning tasks requiring more intelligence Gemini Embedding gemini-embedding-exp Text Text embeddings Measuring the relatedness of text strings Imagen 4 imagen-4.0-generate-preview-06-06 imagen-4.0-ultra-generate-preview-06-06 Text Images Our most up-to-date image generation model Imagen 3 imagen-3.0-generate-002 Text Images High quality image generation model Veo 2 veo-2.0-generate-001 Text, images Video High quality video generation Gemini 2.5 Flash Live gemini-live-2.5-flash-preview Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions Gemini 2.0 Flash Live gemini-2.0-flash-live-001 Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions You can view the rate limits for each model on the rate limits page . Gemini 2.5 Pro Gemini 2.5 Pro is our state-of-the-art thinking model, capable of reasoning over complex problems in code, math, and STEM, as well as analyzing large datasets, codebases, and documents using long context. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-pro save Supported data types Inputs Audio, images, video, text, and PDF Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported Batch API Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-pro Preview: gemini-2.5-pro-preview-06-05 Preview: gemini-2.5-pro-preview-05-06 Preview: gemini-2.5-pro-preview-03-25 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Our best model in terms of price-performance, offering well-rounded capabilities. 2.5 Flash is best for large scale processing, low-latency, high volume tasks that require thinking, and agentic use cases. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash save Supported data types Inputs Text, images, video, audio Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Audio generation Not supported Caching Supported Code execution Supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Supported Thinking Supported Tuning Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-flash Preview: gemini-2.5-flash-preview-05-20 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash-Lite Preview A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-lite-preview-06-17 save Supported data types Inputs Text, images, video, and audio Output Text token_auto Token limits [*] Input token limit 1,000,000 Output token limit 64,000 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported URL Context Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-lite-preview-06-17 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Native Audio Our native audio dialog models, with and without thinking, available through the Live API . These models provide interactive and unstructured conversational experiences, with style and control prompting. Try native audio in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-preview-native-audio-dialog & models/gemini-2.5-flash-exp-native-audio-thinking-dialog save Supported data types Inputs Audio, video, text Output Audio and text token_auto Token limits [*] Input token limit 128,000 Output token limit 8,000 handyman Capabilities Audio generation Supported Caching Not supported Code execution Not supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Not supported Thinking Supported Tuning Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-preview-05-20 Experimental: gemini-2.5-flash-exp-native-audio-thinking-dialog calendar_month Latest update May 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Preview Text-to-Speech Gemini 2.5 Flash Preview TTS is our price-performant text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Flash rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported 123 Versions Read the model version patterns for more details. gemini-2.5-flash-preview-tts calendar_month Latest update May 2025 Gemini 2.5 Pro Preview Text-to-Speech Gemini 2.5 Pro Preview TTS is our most powerful text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Pro rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-pro-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported 123 Versions Read the model version patterns for more details. gemini-2.5-pro-preview-tts calendar_month Latest update May 2025 Gemini 2.0 Flash Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, native tool use, and a 1M token context window. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Not supported Live API Supported Thinking Experimental Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash Stable: gemini-2.0-flash-001 Experimental: gemini-2.0-flash-exp calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash Preview Image Generation Gemini 2.0 Flash Preview Image Generation delivers improved image generation features, including generating and editing images conversationally. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-preview-image-generation save Supported data types Inputs Audio, images, video, and text Output Text and images token_auto Token limits [*] Input token limit 32,000 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Not supported Code execution Not Supported Search Not Supported Image generation Supported Audio generation Not supported Live API Not Supported Thinking Not Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-preview-image-generation gemini-2.0-flash-preview-image-generation is not currently supported in a number of countries in Europe, Middle East & Africa calendar_month Latest update May 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash-Lite A Gemini 2.0 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-lite save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Not supported Search Not supported Image generation Not supported Audio generation Not supported Live API Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash-lite Stable: gemini-2.0-flash-lite-001 calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 1.5 Flash Gemini 1.5 Flash is a fast and versatile multimodal model for scaling across diverse tasks. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-latest Latest stable: gemini-1.5-flash Stable: gemini-1.5-flash-001 gemini-1.5-flash-002 calendar_month Latest update September 2024 Gemini 1.5 Flash-8B Gemini 1.5 Flash-8B is a small model designed for lower intelligence tasks. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-flash-8b save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-8b-latest Latest stable: gemini-1.5-flash-8b Stable: gemini-1.5-flash-8b-001 calendar_month Latest update October 2024 Gemini 1.5 Pro Try Gemini 2.5 Pro Preview , our most advanced Gemini model to date. Gemini 1.5 Pro is a mid-size multimodal model that is optimized for a wide-range of reasoning tasks. 1.5 Pro can process large amounts of data at once, including 2 hours of video, 19 hours of audio, codebases with 60,000 lines of code, or 2,000 pages of text. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-pro save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 2,097,152 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 7,200 Maximum video length 2 hours Maximum audio length Approximately 19 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-pro-latest Latest stable: gemini-1.5-pro Stable: gemini-1.5-pro-001 gemini-1.5-pro-002 calendar_month Latest update September 2024 Imagen 4 Imagen 4 is our latest image model, capable of generating highly detailed images with rich lighting, significantly better text rendering, and higher resolution output than previous models. Model details Property Description id_card Model code Gemini API imagen-4.0-generate-preview-06-06 imagen-4.0-ultra-generate-preview-06-06 save Supported data types Input Text Output Images token_auto Token limits [*] Input token limit 480 tokens (text) Output images 1 (Ultra) 1 to 4 (Standard) calendar_month Latest update June 2025 Imagen 3 Imagen 3 is our highest quality text-to-image model, capable of generating images with even better detail, richer lighting and fewer distracting artifacts than our previous models. Model details Property Description id_card Model code Gemini API imagen-3.0-generate-002 save Supported data types Input Text Output Images token_auto Token limits [*] Input token limit N/A Output images Up to 4 calendar_month Latest update February 2025 Veo 2 Veo 2 is our high quality text- and image-to-video model, capable of generating detailed videos, capturing the artistic nuance in your prompts. Model details Property Description id_card Model code Gemini API veo-2.0-generate-001 save Supported data types Input Text, image Output Video token_auto Limits Text input N/A Image input Any image resolution and aspect ratio up to 20MB file size Output video Up to 2 calendar_month Latest update April 2025 Gemini 2.5 Flash Live The Gemini 2.5 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-live-2.5-flash-preview save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-live-2.5-flash-preview calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.0 Flash Live The Gemini 2.0 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-live-001 save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-live-001 calendar_month Latest update April 2025 cognition_2 Knowledge cutoff August 2024 Gemini Embedding Experimental Gemini embedding achieves a SOTA performance across many key dimensions including code, multi-lingual, and retrieval. Gemini Embedding rate limits are more restricted since it is an experimental model. Model details Property Description id_card Model code Gemini API gemini-embedding-exp-03-07 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 8,192 Output dimension size Elastic, supports: 3072, 1536, or 768 calendar_month Latest update March 2025 Text Embedding and Embedding Text Embedding Try our new experimental Gemini embedding model which achieves state-of-the-art performance. Text embeddings are used to measure the relatedness of strings and are widely used in many AI applications. text-embedding-004 achieves a stronger retrieval performance and outperforms existing models with comparable dimensions, on the standard MTEB embedding benchmarks. Model details Property Description id_card Model code Gemini API models/text-embedding-004 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 2,048 Output dimension size 768 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Not supported calendar_month Latest update April 2024 Embedding Note: Text Embedding is the newer version of the Embedding model. If you're creating a new project, use Text Embedding. You can use the Embedding model to generate text embeddings for input text. The Embedding model is optimized for creating embeddings with 768 dimensions for text of up to 2,048 tokens. Embedding model details Property Description id_card Model code models/embedding-001 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 2,048 Output dimension size 768 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Not supported calendar_month Latest update December 2023 AQA You can use the AQA model to perform Attributed Question-Answering (AQA)–related tasks over a document, corpus, or a set of passages. The AQA model returns answers to questions that are grounded in provided sources, along with estimating answerable probability. Model details Property Description id_card Model code models/aqa save Supported data types Input Text Output Text language Supported language English token_auto Token limits [*] Input token limit 7,168 Output token limit 1,024 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Supported calendar_month Latest update December 2023 See the examples to explore the capabilities of these model variations. [*] A token is equivalent to about 4 characters for Gemini models. 100 tokens are about 60-80 English words. Model version name patterns Gemini models are available in either stable , preview , or experimental versions. In your code, you can use one of the following model name formats to specify which model and version you want to use. Latest stable Points to the most recent stable version released for the specified model generation and variation. To specify the latest stable version, use the following pattern: <model>-<generation>-<variation> . For example, gemini-2.0-flash . Stable Points to a specific stable model. Stable models usually don't change. Most production apps should use a specific stable model. To specify a stable version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-flash-001 . Preview Points to a preview model which may not be suitable for production use, come with more restrictive rate limits, but may have billing enabled. To specify a preview version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.5-pro-preview-06-05 . Experimental Points to an experimental model which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback and get our latest updates into the hands of developers quickly. To specify an experimental version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-pro-exp-02-05 . Experimental models In addition to stable models, the Gemini API offers experimental models which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback, get our latest updates into the hands of developers quickly, and highlight the pace of innovation happening at Google. What we learn from experimental launches informs how we release models more widely. An experimental model can be swapped for another without prior notice. We don't guarantee that an experimental model will become a stable model in the future. Previous experimental models As new versions or stable releases become available, we remove and replace experimental models. You can find the previous experimental models we released in the following section along with the replacement version: Model code Base model Replacement version gemini-2.5-flash-preview-04-17 Gemini 2.5 Flash gemini-2.5-flash-preview-05-20 gemini-2.0-flash-exp-image-generation Gemini 2.0 Flash gemini-2.0-flash-preview-image-generation gemini-2.5-pro-preview-06-05 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-05-06 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-03-25 Gemini 2.5 Pro gemini-2.5-pro gemini-2.0-flash-thinking-exp-01-21 Gemini 2.5 Flash gemini-2.5-flash-preview-04-17 gemini-2.0-pro-exp-02-05 Gemini 2.0 Pro Experimental gemini-2.5-pro-preview-03-25 gemini-2.0-flash-exp Gemini 2.0 Flash gemini-2.0-flash gemini-exp-1206 Gemini 2.0 Pro gemini-2.0-pro-exp-02-05 gemini-2.0-flash-thinking-exp-1219 Gemini 2.0 Flash Thinking gemini-2.0-flash-thinking-exp-01-21 gemini-exp-1121 Gemini gemini-exp-1206 gemini-exp-1114 Gemini gemini-exp-1206 gemini-1.5-pro-exp-0827 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-pro-exp-0801 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-flash-8b-exp-0924 Gemini 1.5 Flash-8B gemini-1.5-flash-8b gemini-1.5-flash-8b-exp-0827 Gemini 1.5 Flash-8B gemini-1.5-flash-8b Supported languages Gemini models are trained to work with the following languages: Arabic ( ar ) Bengali ( bn ) Bulgarian ( bg ) Chinese simplified and traditional ( zh ) Croatian ( hr ) Czech ( cs ) Danish ( da ) Dutch ( nl ) English ( en ) Estonian ( et ) Finnish ( fi ) French ( fr ) German ( de ) Greek ( el ) Hebrew ( iw ) Hindi ( hi ) Hungarian ( hu ) Indonesian ( id ) Italian ( it ) Japanese ( ja ) Korean ( ko ) Latvian ( lv ) Lithuanian ( lt ) Norwegian ( no ) Polish ( pl ) Portuguese ( pt ) Romanian ( ro ) Russian ( ru ) Serbian ( sr ) Slovak ( sk ) Slovenian ( sl ) Spanish ( es ) Swahili ( sw ) Swedish ( sv ) Thai ( th ) Turkish ( tr ) Ukrainian ( uk ) Vietnamese ( vi ) Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-07 UTC.
|
text_content/docs_models_214b6f90.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/models#gemini-2.5-flash
|
| 2 |
+
Title: Gemini models | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Gemini models | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Gemini models 2.5 Pro spark Our most powerful thinking model with maximum response accuracy and state-of-the-art performance Input audio, images, video, and text, get text responses Tackle difficult problems, analyze large databases, and more Best for complex coding, reasoning, and multimodal understanding 2.5 Flash spark Our best model in terms of price-performance, offering well-rounded capabilities. Input audio, images, video, and text, and get text responses Model thinks as needed; or, you can configure a thinking budget Best for low latency, high volume tasks that require thinking 2.5 Flash-Lite experiment A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Input audio, images, video, and text, and get text responses Most cost-efficient model supporting high throughput Best for real time, low latency use cases Note: Gemini 2.5 Pro and 2.5 Flash come with thinking on by default . If you're migrating from a non-thinking model such as 2.0 Pro or Flash, we recommend you to review the Thinking guide first. Model variants The Gemini API offers different models that are optimized for specific use cases. Here's a brief overview of Gemini variants that are available: Model variant Input(s) Output Optimized for Gemini 2.5 Pro gemini-2.5-pro Audio, images, videos, text, and PDF Text Enhanced thinking and reasoning, multimodal understanding, advanced coding, and more Gemini 2.5 Flash gemini-2.5-flash Audio, images, videos, and text Text Adaptive thinking, cost efficiency Gemini 2.5 Flash-Lite Preview gemini-2.5-flash-lite-preview-06-17 Text, image, video, audio Text Most cost-efficient model supporting high throughput Gemini 2.5 Flash Native Audio gemini-2.5-flash-preview-native-audio-dialog & gemini-2.5-flash-exp-native-audio-thinking-dialog Audio, videos, and text Text and audio, interleaved High quality, natural conversational audio outputs, with or without thinking Gemini 2.5 Flash Preview TTS gemini-2.5-flash-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.5 Pro Preview TTS gemini-2.5-pro-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.0 Flash gemini-2.0-flash Audio, images, videos, and text Text Next generation features, speed, and realtime streaming. Gemini 2.0 Flash Preview Image Generation gemini-2.0-flash-preview-image-generation Audio, images, videos, and text Text, images Conversational image generation and editing Gemini 2.0 Flash-Lite gemini-2.0-flash-lite Audio, images, videos, and text Text Cost efficiency and low latency Gemini 1.5 Flash gemini-1.5-flash Audio, images, videos, and text Text Fast and versatile performance across a diverse variety of tasks Gemini 1.5 Flash-8B gemini-1.5-flash-8b Audio, images, videos, and text Text High volume and lower intelligence tasks Gemini 1.5 Pro gemini-1.5-pro Audio, images, videos, and text Text Complex reasoning tasks requiring more intelligence Gemini Embedding gemini-embedding-exp Text Text embeddings Measuring the relatedness of text strings Imagen 4 imagen-4.0-generate-preview-06-06 imagen-4.0-ultra-generate-preview-06-06 Text Images Our most up-to-date image generation model Imagen 3 imagen-3.0-generate-002 Text Images High quality image generation model Veo 2 veo-2.0-generate-001 Text, images Video High quality video generation Gemini 2.5 Flash Live gemini-live-2.5-flash-preview Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions Gemini 2.0 Flash Live gemini-2.0-flash-live-001 Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions You can view the rate limits for each model on the rate limits page . Gemini 2.5 Pro Gemini 2.5 Pro is our state-of-the-art thinking model, capable of reasoning over complex problems in code, math, and STEM, as well as analyzing large datasets, codebases, and documents using long context. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-pro save Supported data types Inputs Audio, images, video, text, and PDF Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported Batch API Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-pro Preview: gemini-2.5-pro-preview-06-05 Preview: gemini-2.5-pro-preview-05-06 Preview: gemini-2.5-pro-preview-03-25 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Our best model in terms of price-performance, offering well-rounded capabilities. 2.5 Flash is best for large scale processing, low-latency, high volume tasks that require thinking, and agentic use cases. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash save Supported data types Inputs Text, images, video, audio Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Audio generation Not supported Caching Supported Code execution Supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Supported Thinking Supported Tuning Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-flash Preview: gemini-2.5-flash-preview-05-20 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash-Lite Preview A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-lite-preview-06-17 save Supported data types Inputs Text, images, video, and audio Output Text token_auto Token limits [*] Input token limit 1,000,000 Output token limit 64,000 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported URL Context Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-lite-preview-06-17 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Native Audio Our native audio dialog models, with and without thinking, available through the Live API . These models provide interactive and unstructured conversational experiences, with style and control prompting. Try native audio in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-preview-native-audio-dialog & models/gemini-2.5-flash-exp-native-audio-thinking-dialog save Supported data types Inputs Audio, video, text Output Audio and text token_auto Token limits [*] Input token limit 128,000 Output token limit 8,000 handyman Capabilities Audio generation Supported Caching Not supported Code execution Not supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Not supported Thinking Supported Tuning Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-preview-05-20 Experimental: gemini-2.5-flash-exp-native-audio-thinking-dialog calendar_month Latest update May 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Preview Text-to-Speech Gemini 2.5 Flash Preview TTS is our price-performant text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Flash rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported 123 Versions Read the model version patterns for more details. gemini-2.5-flash-preview-tts calendar_month Latest update May 2025 Gemini 2.5 Pro Preview Text-to-Speech Gemini 2.5 Pro Preview TTS is our most powerful text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Pro rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-pro-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported 123 Versions Read the model version patterns for more details. gemini-2.5-pro-preview-tts calendar_month Latest update May 2025 Gemini 2.0 Flash Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, native tool use, and a 1M token context window. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Not supported Live API Supported Thinking Experimental Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash Stable: gemini-2.0-flash-001 Experimental: gemini-2.0-flash-exp calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash Preview Image Generation Gemini 2.0 Flash Preview Image Generation delivers improved image generation features, including generating and editing images conversationally. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-preview-image-generation save Supported data types Inputs Audio, images, video, and text Output Text and images token_auto Token limits [*] Input token limit 32,000 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Not supported Code execution Not Supported Search Not Supported Image generation Supported Audio generation Not supported Live API Not Supported Thinking Not Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-preview-image-generation gemini-2.0-flash-preview-image-generation is not currently supported in a number of countries in Europe, Middle East & Africa calendar_month Latest update May 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash-Lite A Gemini 2.0 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-lite save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Not supported Search Not supported Image generation Not supported Audio generation Not supported Live API Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash-lite Stable: gemini-2.0-flash-lite-001 calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 1.5 Flash Gemini 1.5 Flash is a fast and versatile multimodal model for scaling across diverse tasks. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-latest Latest stable: gemini-1.5-flash Stable: gemini-1.5-flash-001 gemini-1.5-flash-002 calendar_month Latest update September 2024 Gemini 1.5 Flash-8B Gemini 1.5 Flash-8B is a small model designed for lower intelligence tasks. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-flash-8b save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-8b-latest Latest stable: gemini-1.5-flash-8b Stable: gemini-1.5-flash-8b-001 calendar_month Latest update October 2024 Gemini 1.5 Pro Try Gemini 2.5 Pro Preview , our most advanced Gemini model to date. Gemini 1.5 Pro is a mid-size multimodal model that is optimized for a wide-range of reasoning tasks. 1.5 Pro can process large amounts of data at once, including 2 hours of video, 19 hours of audio, codebases with 60,000 lines of code, or 2,000 pages of text. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-pro save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 2,097,152 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 7,200 Maximum video length 2 hours Maximum audio length Approximately 19 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-pro-latest Latest stable: gemini-1.5-pro Stable: gemini-1.5-pro-001 gemini-1.5-pro-002 calendar_month Latest update September 2024 Imagen 4 Imagen 4 is our latest image model, capable of generating highly detailed images with rich lighting, significantly better text rendering, and higher resolution output than previous models. Model details Property Description id_card Model code Gemini API imagen-4.0-generate-preview-06-06 imagen-4.0-ultra-generate-preview-06-06 save Supported data types Input Text Output Images token_auto Token limits [*] Input token limit 480 tokens (text) Output images 1 (Ultra) 1 to 4 (Standard) calendar_month Latest update June 2025 Imagen 3 Imagen 3 is our highest quality text-to-image model, capable of generating images with even better detail, richer lighting and fewer distracting artifacts than our previous models. Model details Property Description id_card Model code Gemini API imagen-3.0-generate-002 save Supported data types Input Text Output Images token_auto Token limits [*] Input token limit N/A Output images Up to 4 calendar_month Latest update February 2025 Veo 2 Veo 2 is our high quality text- and image-to-video model, capable of generating detailed videos, capturing the artistic nuance in your prompts. Model details Property Description id_card Model code Gemini API veo-2.0-generate-001 save Supported data types Input Text, image Output Video token_auto Limits Text input N/A Image input Any image resolution and aspect ratio up to 20MB file size Output video Up to 2 calendar_month Latest update April 2025 Gemini 2.5 Flash Live The Gemini 2.5 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-live-2.5-flash-preview save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-live-2.5-flash-preview calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.0 Flash Live The Gemini 2.0 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-live-001 save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-live-001 calendar_month Latest update April 2025 cognition_2 Knowledge cutoff August 2024 Gemini Embedding Experimental Gemini embedding achieves a SOTA performance across many key dimensions including code, multi-lingual, and retrieval. Gemini Embedding rate limits are more restricted since it is an experimental model. Model details Property Description id_card Model code Gemini API gemini-embedding-exp-03-07 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 8,192 Output dimension size Elastic, supports: 3072, 1536, or 768 calendar_month Latest update March 2025 Text Embedding and Embedding Text Embedding Try our new experimental Gemini embedding model which achieves state-of-the-art performance. Text embeddings are used to measure the relatedness of strings and are widely used in many AI applications. text-embedding-004 achieves a stronger retrieval performance and outperforms existing models with comparable dimensions, on the standard MTEB embedding benchmarks. Model details Property Description id_card Model code Gemini API models/text-embedding-004 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 2,048 Output dimension size 768 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Not supported calendar_month Latest update April 2024 Embedding Note: Text Embedding is the newer version of the Embedding model. If you're creating a new project, use Text Embedding. You can use the Embedding model to generate text embeddings for input text. The Embedding model is optimized for creating embeddings with 768 dimensions for text of up to 2,048 tokens. Embedding model details Property Description id_card Model code models/embedding-001 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 2,048 Output dimension size 768 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Not supported calendar_month Latest update December 2023 AQA You can use the AQA model to perform Attributed Question-Answering (AQA)–related tasks over a document, corpus, or a set of passages. The AQA model returns answers to questions that are grounded in provided sources, along with estimating answerable probability. Model details Property Description id_card Model code models/aqa save Supported data types Input Text Output Text language Supported language English token_auto Token limits [*] Input token limit 7,168 Output token limit 1,024 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Supported calendar_month Latest update December 2023 See the examples to explore the capabilities of these model variations. [*] A token is equivalent to about 4 characters for Gemini models. 100 tokens are about 60-80 English words. Model version name patterns Gemini models are available in either stable , preview , or experimental versions. In your code, you can use one of the following model name formats to specify which model and version you want to use. Latest stable Points to the most recent stable version released for the specified model generation and variation. To specify the latest stable version, use the following pattern: <model>-<generation>-<variation> . For example, gemini-2.0-flash . Stable Points to a specific stable model. Stable models usually don't change. Most production apps should use a specific stable model. To specify a stable version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-flash-001 . Preview Points to a preview model which may not be suitable for production use, come with more restrictive rate limits, but may have billing enabled. To specify a preview version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.5-pro-preview-06-05 . Experimental Points to an experimental model which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback and get our latest updates into the hands of developers quickly. To specify an experimental version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-pro-exp-02-05 . Experimental models In addition to stable models, the Gemini API offers experimental models which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback, get our latest updates into the hands of developers quickly, and highlight the pace of innovation happening at Google. What we learn from experimental launches informs how we release models more widely. An experimental model can be swapped for another without prior notice. We don't guarantee that an experimental model will become a stable model in the future. Previous experimental models As new versions or stable releases become available, we remove and replace experimental models. You can find the previous experimental models we released in the following section along with the replacement version: Model code Base model Replacement version gemini-2.5-flash-preview-04-17 Gemini 2.5 Flash gemini-2.5-flash-preview-05-20 gemini-2.0-flash-exp-image-generation Gemini 2.0 Flash gemini-2.0-flash-preview-image-generation gemini-2.5-pro-preview-06-05 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-05-06 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-03-25 Gemini 2.5 Pro gemini-2.5-pro gemini-2.0-flash-thinking-exp-01-21 Gemini 2.5 Flash gemini-2.5-flash-preview-04-17 gemini-2.0-pro-exp-02-05 Gemini 2.0 Pro Experimental gemini-2.5-pro-preview-03-25 gemini-2.0-flash-exp Gemini 2.0 Flash gemini-2.0-flash gemini-exp-1206 Gemini 2.0 Pro gemini-2.0-pro-exp-02-05 gemini-2.0-flash-thinking-exp-1219 Gemini 2.0 Flash Thinking gemini-2.0-flash-thinking-exp-01-21 gemini-exp-1121 Gemini gemini-exp-1206 gemini-exp-1114 Gemini gemini-exp-1206 gemini-1.5-pro-exp-0827 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-pro-exp-0801 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-flash-8b-exp-0924 Gemini 1.5 Flash-8B gemini-1.5-flash-8b gemini-1.5-flash-8b-exp-0827 Gemini 1.5 Flash-8B gemini-1.5-flash-8b Supported languages Gemini models are trained to work with the following languages: Arabic ( ar ) Bengali ( bn ) Bulgarian ( bg ) Chinese simplified and traditional ( zh ) Croatian ( hr ) Czech ( cs ) Danish ( da ) Dutch ( nl ) English ( en ) Estonian ( et ) Finnish ( fi ) French ( fr ) German ( de ) Greek ( el ) Hebrew ( iw ) Hindi ( hi ) Hungarian ( hu ) Indonesian ( id ) Italian ( it ) Japanese ( ja ) Korean ( ko ) Latvian ( lv ) Lithuanian ( lt ) Norwegian ( no ) Polish ( pl ) Portuguese ( pt ) Romanian ( ro ) Russian ( ru ) Serbian ( sr ) Slovak ( sk ) Slovenian ( sl ) Spanish ( es ) Swahili ( sw ) Swedish ( sv ) Thai ( th ) Turkish ( tr ) Ukrainian ( uk ) Vietnamese ( vi ) Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-07 UTC.
|
text_content/docs_models_4b62ed0f.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/models#live-api
|
| 2 |
+
Title: Gemini models | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Gemini models | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Gemini models 2.5 Pro spark Our most powerful thinking model with maximum response accuracy and state-of-the-art performance Input audio, images, video, and text, get text responses Tackle difficult problems, analyze large databases, and more Best for complex coding, reasoning, and multimodal understanding 2.5 Flash spark Our best model in terms of price-performance, offering well-rounded capabilities. Input audio, images, video, and text, and get text responses Model thinks as needed; or, you can configure a thinking budget Best for low latency, high volume tasks that require thinking 2.5 Flash-Lite experiment A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Input audio, images, video, and text, and get text responses Most cost-efficient model supporting high throughput Best for real time, low latency use cases Note: Gemini 2.5 Pro and 2.5 Flash come with thinking on by default . If you're migrating from a non-thinking model such as 2.0 Pro or Flash, we recommend you to review the Thinking guide first. Model variants The Gemini API offers different models that are optimized for specific use cases. Here's a brief overview of Gemini variants that are available: Model variant Input(s) Output Optimized for Gemini 2.5 Pro gemini-2.5-pro Audio, images, videos, text, and PDF Text Enhanced thinking and reasoning, multimodal understanding, advanced coding, and more Gemini 2.5 Flash gemini-2.5-flash Audio, images, videos, and text Text Adaptive thinking, cost efficiency Gemini 2.5 Flash-Lite Preview gemini-2.5-flash-lite-preview-06-17 Text, image, video, audio Text Most cost-efficient model supporting high throughput Gemini 2.5 Flash Native Audio gemini-2.5-flash-preview-native-audio-dialog & gemini-2.5-flash-exp-native-audio-thinking-dialog Audio, videos, and text Text and audio, interleaved High quality, natural conversational audio outputs, with or without thinking Gemini 2.5 Flash Preview TTS gemini-2.5-flash-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.5 Pro Preview TTS gemini-2.5-pro-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.0 Flash gemini-2.0-flash Audio, images, videos, and text Text Next generation features, speed, and realtime streaming. Gemini 2.0 Flash Preview Image Generation gemini-2.0-flash-preview-image-generation Audio, images, videos, and text Text, images Conversational image generation and editing Gemini 2.0 Flash-Lite gemini-2.0-flash-lite Audio, images, videos, and text Text Cost efficiency and low latency Gemini 1.5 Flash gemini-1.5-flash Audio, images, videos, and text Text Fast and versatile performance across a diverse variety of tasks Gemini 1.5 Flash-8B gemini-1.5-flash-8b Audio, images, videos, and text Text High volume and lower intelligence tasks Gemini 1.5 Pro gemini-1.5-pro Audio, images, videos, and text Text Complex reasoning tasks requiring more intelligence Gemini Embedding gemini-embedding-exp Text Text embeddings Measuring the relatedness of text strings Imagen 4 imagen-4.0-generate-preview-06-06 imagen-4.0-ultra-generate-preview-06-06 Text Images Our most up-to-date image generation model Imagen 3 imagen-3.0-generate-002 Text Images High quality image generation model Veo 2 veo-2.0-generate-001 Text, images Video High quality video generation Gemini 2.5 Flash Live gemini-live-2.5-flash-preview Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions Gemini 2.0 Flash Live gemini-2.0-flash-live-001 Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions You can view the rate limits for each model on the rate limits page . Gemini 2.5 Pro Gemini 2.5 Pro is our state-of-the-art thinking model, capable of reasoning over complex problems in code, math, and STEM, as well as analyzing large datasets, codebases, and documents using long context. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-pro save Supported data types Inputs Audio, images, video, text, and PDF Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported Batch API Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-pro Preview: gemini-2.5-pro-preview-06-05 Preview: gemini-2.5-pro-preview-05-06 Preview: gemini-2.5-pro-preview-03-25 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Our best model in terms of price-performance, offering well-rounded capabilities. 2.5 Flash is best for large scale processing, low-latency, high volume tasks that require thinking, and agentic use cases. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash save Supported data types Inputs Text, images, video, audio Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Audio generation Not supported Caching Supported Code execution Supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Supported Thinking Supported Tuning Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-flash Preview: gemini-2.5-flash-preview-05-20 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash-Lite Preview A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-lite-preview-06-17 save Supported data types Inputs Text, images, video, and audio Output Text token_auto Token limits [*] Input token limit 1,000,000 Output token limit 64,000 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported URL Context Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-lite-preview-06-17 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Native Audio Our native audio dialog models, with and without thinking, available through the Live API . These models provide interactive and unstructured conversational experiences, with style and control prompting. Try native audio in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-preview-native-audio-dialog & models/gemini-2.5-flash-exp-native-audio-thinking-dialog save Supported data types Inputs Audio, video, text Output Audio and text token_auto Token limits [*] Input token limit 128,000 Output token limit 8,000 handyman Capabilities Audio generation Supported Caching Not supported Code execution Not supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Not supported Thinking Supported Tuning Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-preview-05-20 Experimental: gemini-2.5-flash-exp-native-audio-thinking-dialog calendar_month Latest update May 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Preview Text-to-Speech Gemini 2.5 Flash Preview TTS is our price-performant text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Flash rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported 123 Versions Read the model version patterns for more details. gemini-2.5-flash-preview-tts calendar_month Latest update May 2025 Gemini 2.5 Pro Preview Text-to-Speech Gemini 2.5 Pro Preview TTS is our most powerful text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Pro rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-pro-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported 123 Versions Read the model version patterns for more details. gemini-2.5-pro-preview-tts calendar_month Latest update May 2025 Gemini 2.0 Flash Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, native tool use, and a 1M token context window. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Not supported Live API Supported Thinking Experimental Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash Stable: gemini-2.0-flash-001 Experimental: gemini-2.0-flash-exp calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash Preview Image Generation Gemini 2.0 Flash Preview Image Generation delivers improved image generation features, including generating and editing images conversationally. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-preview-image-generation save Supported data types Inputs Audio, images, video, and text Output Text and images token_auto Token limits [*] Input token limit 32,000 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Not supported Code execution Not Supported Search Not Supported Image generation Supported Audio generation Not supported Live API Not Supported Thinking Not Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-preview-image-generation gemini-2.0-flash-preview-image-generation is not currently supported in a number of countries in Europe, Middle East & Africa calendar_month Latest update May 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash-Lite A Gemini 2.0 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-lite save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Not supported Search Not supported Image generation Not supported Audio generation Not supported Live API Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash-lite Stable: gemini-2.0-flash-lite-001 calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 1.5 Flash Gemini 1.5 Flash is a fast and versatile multimodal model for scaling across diverse tasks. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-latest Latest stable: gemini-1.5-flash Stable: gemini-1.5-flash-001 gemini-1.5-flash-002 calendar_month Latest update September 2024 Gemini 1.5 Flash-8B Gemini 1.5 Flash-8B is a small model designed for lower intelligence tasks. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-flash-8b save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-8b-latest Latest stable: gemini-1.5-flash-8b Stable: gemini-1.5-flash-8b-001 calendar_month Latest update October 2024 Gemini 1.5 Pro Try Gemini 2.5 Pro Preview , our most advanced Gemini model to date. Gemini 1.5 Pro is a mid-size multimodal model that is optimized for a wide-range of reasoning tasks. 1.5 Pro can process large amounts of data at once, including 2 hours of video, 19 hours of audio, codebases with 60,000 lines of code, or 2,000 pages of text. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-pro save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 2,097,152 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 7,200 Maximum video length 2 hours Maximum audio length Approximately 19 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-pro-latest Latest stable: gemini-1.5-pro Stable: gemini-1.5-pro-001 gemini-1.5-pro-002 calendar_month Latest update September 2024 Imagen 4 Imagen 4 is our latest image model, capable of generating highly detailed images with rich lighting, significantly better text rendering, and higher resolution output than previous models. Model details Property Description id_card Model code Gemini API imagen-4.0-generate-preview-06-06 imagen-4.0-ultra-generate-preview-06-06 save Supported data types Input Text Output Images token_auto Token limits [*] Input token limit 480 tokens (text) Output images 1 (Ultra) 1 to 4 (Standard) calendar_month Latest update June 2025 Imagen 3 Imagen 3 is our highest quality text-to-image model, capable of generating images with even better detail, richer lighting and fewer distracting artifacts than our previous models. Model details Property Description id_card Model code Gemini API imagen-3.0-generate-002 save Supported data types Input Text Output Images token_auto Token limits [*] Input token limit N/A Output images Up to 4 calendar_month Latest update February 2025 Veo 2 Veo 2 is our high quality text- and image-to-video model, capable of generating detailed videos, capturing the artistic nuance in your prompts. Model details Property Description id_card Model code Gemini API veo-2.0-generate-001 save Supported data types Input Text, image Output Video token_auto Limits Text input N/A Image input Any image resolution and aspect ratio up to 20MB file size Output video Up to 2 calendar_month Latest update April 2025 Gemini 2.5 Flash Live The Gemini 2.5 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-live-2.5-flash-preview save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-live-2.5-flash-preview calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.0 Flash Live The Gemini 2.0 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-live-001 save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-live-001 calendar_month Latest update April 2025 cognition_2 Knowledge cutoff August 2024 Gemini Embedding Experimental Gemini embedding achieves a SOTA performance across many key dimensions including code, multi-lingual, and retrieval. Gemini Embedding rate limits are more restricted since it is an experimental model. Model details Property Description id_card Model code Gemini API gemini-embedding-exp-03-07 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 8,192 Output dimension size Elastic, supports: 3072, 1536, or 768 calendar_month Latest update March 2025 Text Embedding and Embedding Text Embedding Try our new experimental Gemini embedding model which achieves state-of-the-art performance. Text embeddings are used to measure the relatedness of strings and are widely used in many AI applications. text-embedding-004 achieves a stronger retrieval performance and outperforms existing models with comparable dimensions, on the standard MTEB embedding benchmarks. Model details Property Description id_card Model code Gemini API models/text-embedding-004 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 2,048 Output dimension size 768 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Not supported calendar_month Latest update April 2024 Embedding Note: Text Embedding is the newer version of the Embedding model. If you're creating a new project, use Text Embedding. You can use the Embedding model to generate text embeddings for input text. The Embedding model is optimized for creating embeddings with 768 dimensions for text of up to 2,048 tokens. Embedding model details Property Description id_card Model code models/embedding-001 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 2,048 Output dimension size 768 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Not supported calendar_month Latest update December 2023 AQA You can use the AQA model to perform Attributed Question-Answering (AQA)–related tasks over a document, corpus, or a set of passages. The AQA model returns answers to questions that are grounded in provided sources, along with estimating answerable probability. Model details Property Description id_card Model code models/aqa save Supported data types Input Text Output Text language Supported language English token_auto Token limits [*] Input token limit 7,168 Output token limit 1,024 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Supported calendar_month Latest update December 2023 See the examples to explore the capabilities of these model variations. [*] A token is equivalent to about 4 characters for Gemini models. 100 tokens are about 60-80 English words. Model version name patterns Gemini models are available in either stable , preview , or experimental versions. In your code, you can use one of the following model name formats to specify which model and version you want to use. Latest stable Points to the most recent stable version released for the specified model generation and variation. To specify the latest stable version, use the following pattern: <model>-<generation>-<variation> . For example, gemini-2.0-flash . Stable Points to a specific stable model. Stable models usually don't change. Most production apps should use a specific stable model. To specify a stable version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-flash-001 . Preview Points to a preview model which may not be suitable for production use, come with more restrictive rate limits, but may have billing enabled. To specify a preview version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.5-pro-preview-06-05 . Experimental Points to an experimental model which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback and get our latest updates into the hands of developers quickly. To specify an experimental version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-pro-exp-02-05 . Experimental models In addition to stable models, the Gemini API offers experimental models which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback, get our latest updates into the hands of developers quickly, and highlight the pace of innovation happening at Google. What we learn from experimental launches informs how we release models more widely. An experimental model can be swapped for another without prior notice. We don't guarantee that an experimental model will become a stable model in the future. Previous experimental models As new versions or stable releases become available, we remove and replace experimental models. You can find the previous experimental models we released in the following section along with the replacement version: Model code Base model Replacement version gemini-2.5-flash-preview-04-17 Gemini 2.5 Flash gemini-2.5-flash-preview-05-20 gemini-2.0-flash-exp-image-generation Gemini 2.0 Flash gemini-2.0-flash-preview-image-generation gemini-2.5-pro-preview-06-05 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-05-06 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-03-25 Gemini 2.5 Pro gemini-2.5-pro gemini-2.0-flash-thinking-exp-01-21 Gemini 2.5 Flash gemini-2.5-flash-preview-04-17 gemini-2.0-pro-exp-02-05 Gemini 2.0 Pro Experimental gemini-2.5-pro-preview-03-25 gemini-2.0-flash-exp Gemini 2.0 Flash gemini-2.0-flash gemini-exp-1206 Gemini 2.0 Pro gemini-2.0-pro-exp-02-05 gemini-2.0-flash-thinking-exp-1219 Gemini 2.0 Flash Thinking gemini-2.0-flash-thinking-exp-01-21 gemini-exp-1121 Gemini gemini-exp-1206 gemini-exp-1114 Gemini gemini-exp-1206 gemini-1.5-pro-exp-0827 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-pro-exp-0801 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-flash-8b-exp-0924 Gemini 1.5 Flash-8B gemini-1.5-flash-8b gemini-1.5-flash-8b-exp-0827 Gemini 1.5 Flash-8B gemini-1.5-flash-8b Supported languages Gemini models are trained to work with the following languages: Arabic ( ar ) Bengali ( bn ) Bulgarian ( bg ) Chinese simplified and traditional ( zh ) Croatian ( hr ) Czech ( cs ) Danish ( da ) Dutch ( nl ) English ( en ) Estonian ( et ) Finnish ( fi ) French ( fr ) German ( de ) Greek ( el ) Hebrew ( iw ) Hindi ( hi ) Hungarian ( hu ) Indonesian ( id ) Italian ( it ) Japanese ( ja ) Korean ( ko ) Latvian ( lv ) Lithuanian ( lt ) Norwegian ( no ) Polish ( pl ) Portuguese ( pt ) Romanian ( ro ) Russian ( ru ) Serbian ( sr ) Slovak ( sk ) Slovenian ( sl ) Spanish ( es ) Swahili ( sw ) Swedish ( sv ) Thai ( th ) Turkish ( tr ) Ukrainian ( uk ) Vietnamese ( vi ) Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-07 UTC.
|
text_content/docs_models_8d3d236a.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/models#live-api-2.0
|
| 2 |
+
Title: Gemini models | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Gemini models | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Gemini models 2.5 Pro spark Our most powerful thinking model with maximum response accuracy and state-of-the-art performance Input audio, images, video, and text, get text responses Tackle difficult problems, analyze large databases, and more Best for complex coding, reasoning, and multimodal understanding 2.5 Flash spark Our best model in terms of price-performance, offering well-rounded capabilities. Input audio, images, video, and text, and get text responses Model thinks as needed; or, you can configure a thinking budget Best for low latency, high volume tasks that require thinking 2.5 Flash-Lite experiment A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Input audio, images, video, and text, and get text responses Most cost-efficient model supporting high throughput Best for real time, low latency use cases Note: Gemini 2.5 Pro and 2.5 Flash come with thinking on by default . If you're migrating from a non-thinking model such as 2.0 Pro or Flash, we recommend you to review the Thinking guide first. Model variants The Gemini API offers different models that are optimized for specific use cases. Here's a brief overview of Gemini variants that are available: Model variant Input(s) Output Optimized for Gemini 2.5 Pro gemini-2.5-pro Audio, images, videos, text, and PDF Text Enhanced thinking and reasoning, multimodal understanding, advanced coding, and more Gemini 2.5 Flash gemini-2.5-flash Audio, images, videos, and text Text Adaptive thinking, cost efficiency Gemini 2.5 Flash-Lite Preview gemini-2.5-flash-lite-preview-06-17 Text, image, video, audio Text Most cost-efficient model supporting high throughput Gemini 2.5 Flash Native Audio gemini-2.5-flash-preview-native-audio-dialog & gemini-2.5-flash-exp-native-audio-thinking-dialog Audio, videos, and text Text and audio, interleaved High quality, natural conversational audio outputs, with or without thinking Gemini 2.5 Flash Preview TTS gemini-2.5-flash-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.5 Pro Preview TTS gemini-2.5-pro-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.0 Flash gemini-2.0-flash Audio, images, videos, and text Text Next generation features, speed, and realtime streaming. Gemini 2.0 Flash Preview Image Generation gemini-2.0-flash-preview-image-generation Audio, images, videos, and text Text, images Conversational image generation and editing Gemini 2.0 Flash-Lite gemini-2.0-flash-lite Audio, images, videos, and text Text Cost efficiency and low latency Gemini 1.5 Flash gemini-1.5-flash Audio, images, videos, and text Text Fast and versatile performance across a diverse variety of tasks Gemini 1.5 Flash-8B gemini-1.5-flash-8b Audio, images, videos, and text Text High volume and lower intelligence tasks Gemini 1.5 Pro gemini-1.5-pro Audio, images, videos, and text Text Complex reasoning tasks requiring more intelligence Gemini Embedding gemini-embedding-exp Text Text embeddings Measuring the relatedness of text strings Imagen 4 imagen-4.0-generate-preview-06-06 imagen-4.0-ultra-generate-preview-06-06 Text Images Our most up-to-date image generation model Imagen 3 imagen-3.0-generate-002 Text Images High quality image generation model Veo 2 veo-2.0-generate-001 Text, images Video High quality video generation Gemini 2.5 Flash Live gemini-live-2.5-flash-preview Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions Gemini 2.0 Flash Live gemini-2.0-flash-live-001 Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions You can view the rate limits for each model on the rate limits page . Gemini 2.5 Pro Gemini 2.5 Pro is our state-of-the-art thinking model, capable of reasoning over complex problems in code, math, and STEM, as well as analyzing large datasets, codebases, and documents using long context. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-pro save Supported data types Inputs Audio, images, video, text, and PDF Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported Batch API Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-pro Preview: gemini-2.5-pro-preview-06-05 Preview: gemini-2.5-pro-preview-05-06 Preview: gemini-2.5-pro-preview-03-25 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Our best model in terms of price-performance, offering well-rounded capabilities. 2.5 Flash is best for large scale processing, low-latency, high volume tasks that require thinking, and agentic use cases. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash save Supported data types Inputs Text, images, video, audio Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Audio generation Not supported Caching Supported Code execution Supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Supported Thinking Supported Tuning Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-flash Preview: gemini-2.5-flash-preview-05-20 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash-Lite Preview A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-lite-preview-06-17 save Supported data types Inputs Text, images, video, and audio Output Text token_auto Token limits [*] Input token limit 1,000,000 Output token limit 64,000 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported URL Context Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-lite-preview-06-17 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Native Audio Our native audio dialog models, with and without thinking, available through the Live API . These models provide interactive and unstructured conversational experiences, with style and control prompting. Try native audio in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-preview-native-audio-dialog & models/gemini-2.5-flash-exp-native-audio-thinking-dialog save Supported data types Inputs Audio, video, text Output Audio and text token_auto Token limits [*] Input token limit 128,000 Output token limit 8,000 handyman Capabilities Audio generation Supported Caching Not supported Code execution Not supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Not supported Thinking Supported Tuning Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-preview-05-20 Experimental: gemini-2.5-flash-exp-native-audio-thinking-dialog calendar_month Latest update May 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Preview Text-to-Speech Gemini 2.5 Flash Preview TTS is our price-performant text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Flash rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported 123 Versions Read the model version patterns for more details. gemini-2.5-flash-preview-tts calendar_month Latest update May 2025 Gemini 2.5 Pro Preview Text-to-Speech Gemini 2.5 Pro Preview TTS is our most powerful text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Pro rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-pro-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported 123 Versions Read the model version patterns for more details. gemini-2.5-pro-preview-tts calendar_month Latest update May 2025 Gemini 2.0 Flash Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, native tool use, and a 1M token context window. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Not supported Live API Supported Thinking Experimental Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash Stable: gemini-2.0-flash-001 Experimental: gemini-2.0-flash-exp calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash Preview Image Generation Gemini 2.0 Flash Preview Image Generation delivers improved image generation features, including generating and editing images conversationally. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-preview-image-generation save Supported data types Inputs Audio, images, video, and text Output Text and images token_auto Token limits [*] Input token limit 32,000 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Not supported Code execution Not Supported Search Not Supported Image generation Supported Audio generation Not supported Live API Not Supported Thinking Not Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-preview-image-generation gemini-2.0-flash-preview-image-generation is not currently supported in a number of countries in Europe, Middle East & Africa calendar_month Latest update May 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash-Lite A Gemini 2.0 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-lite save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Not supported Search Not supported Image generation Not supported Audio generation Not supported Live API Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash-lite Stable: gemini-2.0-flash-lite-001 calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 1.5 Flash Gemini 1.5 Flash is a fast and versatile multimodal model for scaling across diverse tasks. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-latest Latest stable: gemini-1.5-flash Stable: gemini-1.5-flash-001 gemini-1.5-flash-002 calendar_month Latest update September 2024 Gemini 1.5 Flash-8B Gemini 1.5 Flash-8B is a small model designed for lower intelligence tasks. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-flash-8b save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-8b-latest Latest stable: gemini-1.5-flash-8b Stable: gemini-1.5-flash-8b-001 calendar_month Latest update October 2024 Gemini 1.5 Pro Try Gemini 2.5 Pro Preview , our most advanced Gemini model to date. Gemini 1.5 Pro is a mid-size multimodal model that is optimized for a wide-range of reasoning tasks. 1.5 Pro can process large amounts of data at once, including 2 hours of video, 19 hours of audio, codebases with 60,000 lines of code, or 2,000 pages of text. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-pro save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 2,097,152 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 7,200 Maximum video length 2 hours Maximum audio length Approximately 19 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-pro-latest Latest stable: gemini-1.5-pro Stable: gemini-1.5-pro-001 gemini-1.5-pro-002 calendar_month Latest update September 2024 Imagen 4 Imagen 4 is our latest image model, capable of generating highly detailed images with rich lighting, significantly better text rendering, and higher resolution output than previous models. Model details Property Description id_card Model code Gemini API imagen-4.0-generate-preview-06-06 imagen-4.0-ultra-generate-preview-06-06 save Supported data types Input Text Output Images token_auto Token limits [*] Input token limit 480 tokens (text) Output images 1 (Ultra) 1 to 4 (Standard) calendar_month Latest update June 2025 Imagen 3 Imagen 3 is our highest quality text-to-image model, capable of generating images with even better detail, richer lighting and fewer distracting artifacts than our previous models. Model details Property Description id_card Model code Gemini API imagen-3.0-generate-002 save Supported data types Input Text Output Images token_auto Token limits [*] Input token limit N/A Output images Up to 4 calendar_month Latest update February 2025 Veo 2 Veo 2 is our high quality text- and image-to-video model, capable of generating detailed videos, capturing the artistic nuance in your prompts. Model details Property Description id_card Model code Gemini API veo-2.0-generate-001 save Supported data types Input Text, image Output Video token_auto Limits Text input N/A Image input Any image resolution and aspect ratio up to 20MB file size Output video Up to 2 calendar_month Latest update April 2025 Gemini 2.5 Flash Live The Gemini 2.5 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-live-2.5-flash-preview save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-live-2.5-flash-preview calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.0 Flash Live The Gemini 2.0 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-live-001 save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-live-001 calendar_month Latest update April 2025 cognition_2 Knowledge cutoff August 2024 Gemini Embedding Experimental Gemini embedding achieves a SOTA performance across many key dimensions including code, multi-lingual, and retrieval. Gemini Embedding rate limits are more restricted since it is an experimental model. Model details Property Description id_card Model code Gemini API gemini-embedding-exp-03-07 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 8,192 Output dimension size Elastic, supports: 3072, 1536, or 768 calendar_month Latest update March 2025 Text Embedding and Embedding Text Embedding Try our new experimental Gemini embedding model which achieves state-of-the-art performance. Text embeddings are used to measure the relatedness of strings and are widely used in many AI applications. text-embedding-004 achieves a stronger retrieval performance and outperforms existing models with comparable dimensions, on the standard MTEB embedding benchmarks. Model details Property Description id_card Model code Gemini API models/text-embedding-004 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 2,048 Output dimension size 768 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Not supported calendar_month Latest update April 2024 Embedding Note: Text Embedding is the newer version of the Embedding model. If you're creating a new project, use Text Embedding. You can use the Embedding model to generate text embeddings for input text. The Embedding model is optimized for creating embeddings with 768 dimensions for text of up to 2,048 tokens. Embedding model details Property Description id_card Model code models/embedding-001 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 2,048 Output dimension size 768 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Not supported calendar_month Latest update December 2023 AQA You can use the AQA model to perform Attributed Question-Answering (AQA)–related tasks over a document, corpus, or a set of passages. The AQA model returns answers to questions that are grounded in provided sources, along with estimating answerable probability. Model details Property Description id_card Model code models/aqa save Supported data types Input Text Output Text language Supported language English token_auto Token limits [*] Input token limit 7,168 Output token limit 1,024 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Supported calendar_month Latest update December 2023 See the examples to explore the capabilities of these model variations. [*] A token is equivalent to about 4 characters for Gemini models. 100 tokens are about 60-80 English words. Model version name patterns Gemini models are available in either stable , preview , or experimental versions. In your code, you can use one of the following model name formats to specify which model and version you want to use. Latest stable Points to the most recent stable version released for the specified model generation and variation. To specify the latest stable version, use the following pattern: <model>-<generation>-<variation> . For example, gemini-2.0-flash . Stable Points to a specific stable model. Stable models usually don't change. Most production apps should use a specific stable model. To specify a stable version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-flash-001 . Preview Points to a preview model which may not be suitable for production use, come with more restrictive rate limits, but may have billing enabled. To specify a preview version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.5-pro-preview-06-05 . Experimental Points to an experimental model which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback and get our latest updates into the hands of developers quickly. To specify an experimental version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-pro-exp-02-05 . Experimental models In addition to stable models, the Gemini API offers experimental models which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback, get our latest updates into the hands of developers quickly, and highlight the pace of innovation happening at Google. What we learn from experimental launches informs how we release models more widely. An experimental model can be swapped for another without prior notice. We don't guarantee that an experimental model will become a stable model in the future. Previous experimental models As new versions or stable releases become available, we remove and replace experimental models. You can find the previous experimental models we released in the following section along with the replacement version: Model code Base model Replacement version gemini-2.5-flash-preview-04-17 Gemini 2.5 Flash gemini-2.5-flash-preview-05-20 gemini-2.0-flash-exp-image-generation Gemini 2.0 Flash gemini-2.0-flash-preview-image-generation gemini-2.5-pro-preview-06-05 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-05-06 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-03-25 Gemini 2.5 Pro gemini-2.5-pro gemini-2.0-flash-thinking-exp-01-21 Gemini 2.5 Flash gemini-2.5-flash-preview-04-17 gemini-2.0-pro-exp-02-05 Gemini 2.0 Pro Experimental gemini-2.5-pro-preview-03-25 gemini-2.0-flash-exp Gemini 2.0 Flash gemini-2.0-flash gemini-exp-1206 Gemini 2.0 Pro gemini-2.0-pro-exp-02-05 gemini-2.0-flash-thinking-exp-1219 Gemini 2.0 Flash Thinking gemini-2.0-flash-thinking-exp-01-21 gemini-exp-1121 Gemini gemini-exp-1206 gemini-exp-1114 Gemini gemini-exp-1206 gemini-1.5-pro-exp-0827 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-pro-exp-0801 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-flash-8b-exp-0924 Gemini 1.5 Flash-8B gemini-1.5-flash-8b gemini-1.5-flash-8b-exp-0827 Gemini 1.5 Flash-8B gemini-1.5-flash-8b Supported languages Gemini models are trained to work with the following languages: Arabic ( ar ) Bengali ( bn ) Bulgarian ( bg ) Chinese simplified and traditional ( zh ) Croatian ( hr ) Czech ( cs ) Danish ( da ) Dutch ( nl ) English ( en ) Estonian ( et ) Finnish ( fi ) French ( fr ) German ( de ) Greek ( el ) Hebrew ( iw ) Hindi ( hi ) Hungarian ( hu ) Indonesian ( id ) Italian ( it ) Japanese ( ja ) Korean ( ko ) Latvian ( lv ) Lithuanian ( lt ) Norwegian ( no ) Polish ( pl ) Portuguese ( pt ) Romanian ( ro ) Russian ( ru ) Serbian ( sr ) Slovak ( sk ) Slovenian ( sl ) Spanish ( es ) Swahili ( sw ) Swedish ( sv ) Thai ( th ) Turkish ( tr ) Ukrainian ( uk ) Vietnamese ( vi ) Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-07 UTC.
|
text_content/docs_models_b7a54a26.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/models#imagen-3
|
| 2 |
+
Title: Gemini models | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Gemini models | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Gemini models 2.5 Pro spark Our most powerful thinking model with maximum response accuracy and state-of-the-art performance Input audio, images, video, and text, get text responses Tackle difficult problems, analyze large databases, and more Best for complex coding, reasoning, and multimodal understanding 2.5 Flash spark Our best model in terms of price-performance, offering well-rounded capabilities. Input audio, images, video, and text, and get text responses Model thinks as needed; or, you can configure a thinking budget Best for low latency, high volume tasks that require thinking 2.5 Flash-Lite experiment A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Input audio, images, video, and text, and get text responses Most cost-efficient model supporting high throughput Best for real time, low latency use cases Note: Gemini 2.5 Pro and 2.5 Flash come with thinking on by default . If you're migrating from a non-thinking model such as 2.0 Pro or Flash, we recommend you to review the Thinking guide first. Model variants The Gemini API offers different models that are optimized for specific use cases. Here's a brief overview of Gemini variants that are available: Model variant Input(s) Output Optimized for Gemini 2.5 Pro gemini-2.5-pro Audio, images, videos, text, and PDF Text Enhanced thinking and reasoning, multimodal understanding, advanced coding, and more Gemini 2.5 Flash gemini-2.5-flash Audio, images, videos, and text Text Adaptive thinking, cost efficiency Gemini 2.5 Flash-Lite Preview gemini-2.5-flash-lite-preview-06-17 Text, image, video, audio Text Most cost-efficient model supporting high throughput Gemini 2.5 Flash Native Audio gemini-2.5-flash-preview-native-audio-dialog & gemini-2.5-flash-exp-native-audio-thinking-dialog Audio, videos, and text Text and audio, interleaved High quality, natural conversational audio outputs, with or without thinking Gemini 2.5 Flash Preview TTS gemini-2.5-flash-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.5 Pro Preview TTS gemini-2.5-pro-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.0 Flash gemini-2.0-flash Audio, images, videos, and text Text Next generation features, speed, and realtime streaming. Gemini 2.0 Flash Preview Image Generation gemini-2.0-flash-preview-image-generation Audio, images, videos, and text Text, images Conversational image generation and editing Gemini 2.0 Flash-Lite gemini-2.0-flash-lite Audio, images, videos, and text Text Cost efficiency and low latency Gemini 1.5 Flash gemini-1.5-flash Audio, images, videos, and text Text Fast and versatile performance across a diverse variety of tasks Gemini 1.5 Flash-8B gemini-1.5-flash-8b Audio, images, videos, and text Text High volume and lower intelligence tasks Gemini 1.5 Pro gemini-1.5-pro Audio, images, videos, and text Text Complex reasoning tasks requiring more intelligence Gemini Embedding gemini-embedding-exp Text Text embeddings Measuring the relatedness of text strings Imagen 4 imagen-4.0-generate-preview-06-06 imagen-4.0-ultra-generate-preview-06-06 Text Images Our most up-to-date image generation model Imagen 3 imagen-3.0-generate-002 Text Images High quality image generation model Veo 2 veo-2.0-generate-001 Text, images Video High quality video generation Gemini 2.5 Flash Live gemini-live-2.5-flash-preview Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions Gemini 2.0 Flash Live gemini-2.0-flash-live-001 Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions You can view the rate limits for each model on the rate limits page . Gemini 2.5 Pro Gemini 2.5 Pro is our state-of-the-art thinking model, capable of reasoning over complex problems in code, math, and STEM, as well as analyzing large datasets, codebases, and documents using long context. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-pro save Supported data types Inputs Audio, images, video, text, and PDF Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported Batch API Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-pro Preview: gemini-2.5-pro-preview-06-05 Preview: gemini-2.5-pro-preview-05-06 Preview: gemini-2.5-pro-preview-03-25 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Our best model in terms of price-performance, offering well-rounded capabilities. 2.5 Flash is best for large scale processing, low-latency, high volume tasks that require thinking, and agentic use cases. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash save Supported data types Inputs Text, images, video, audio Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Audio generation Not supported Caching Supported Code execution Supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Supported Thinking Supported Tuning Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-flash Preview: gemini-2.5-flash-preview-05-20 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash-Lite Preview A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-lite-preview-06-17 save Supported data types Inputs Text, images, video, and audio Output Text token_auto Token limits [*] Input token limit 1,000,000 Output token limit 64,000 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported URL Context Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-lite-preview-06-17 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Native Audio Our native audio dialog models, with and without thinking, available through the Live API . These models provide interactive and unstructured conversational experiences, with style and control prompting. Try native audio in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-preview-native-audio-dialog & models/gemini-2.5-flash-exp-native-audio-thinking-dialog save Supported data types Inputs Audio, video, text Output Audio and text token_auto Token limits [*] Input token limit 128,000 Output token limit 8,000 handyman Capabilities Audio generation Supported Caching Not supported Code execution Not supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Not supported Thinking Supported Tuning Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-preview-05-20 Experimental: gemini-2.5-flash-exp-native-audio-thinking-dialog calendar_month Latest update May 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Preview Text-to-Speech Gemini 2.5 Flash Preview TTS is our price-performant text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Flash rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported 123 Versions Read the model version patterns for more details. gemini-2.5-flash-preview-tts calendar_month Latest update May 2025 Gemini 2.5 Pro Preview Text-to-Speech Gemini 2.5 Pro Preview TTS is our most powerful text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Pro rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-pro-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported 123 Versions Read the model version patterns for more details. gemini-2.5-pro-preview-tts calendar_month Latest update May 2025 Gemini 2.0 Flash Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, native tool use, and a 1M token context window. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Not supported Live API Supported Thinking Experimental Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash Stable: gemini-2.0-flash-001 Experimental: gemini-2.0-flash-exp calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash Preview Image Generation Gemini 2.0 Flash Preview Image Generation delivers improved image generation features, including generating and editing images conversationally. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-preview-image-generation save Supported data types Inputs Audio, images, video, and text Output Text and images token_auto Token limits [*] Input token limit 32,000 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Not supported Code execution Not Supported Search Not Supported Image generation Supported Audio generation Not supported Live API Not Supported Thinking Not Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-preview-image-generation gemini-2.0-flash-preview-image-generation is not currently supported in a number of countries in Europe, Middle East & Africa calendar_month Latest update May 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash-Lite A Gemini 2.0 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-lite save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Not supported Search Not supported Image generation Not supported Audio generation Not supported Live API Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash-lite Stable: gemini-2.0-flash-lite-001 calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 1.5 Flash Gemini 1.5 Flash is a fast and versatile multimodal model for scaling across diverse tasks. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-latest Latest stable: gemini-1.5-flash Stable: gemini-1.5-flash-001 gemini-1.5-flash-002 calendar_month Latest update September 2024 Gemini 1.5 Flash-8B Gemini 1.5 Flash-8B is a small model designed for lower intelligence tasks. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-flash-8b save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-8b-latest Latest stable: gemini-1.5-flash-8b Stable: gemini-1.5-flash-8b-001 calendar_month Latest update October 2024 Gemini 1.5 Pro Try Gemini 2.5 Pro Preview , our most advanced Gemini model to date. Gemini 1.5 Pro is a mid-size multimodal model that is optimized for a wide-range of reasoning tasks. 1.5 Pro can process large amounts of data at once, including 2 hours of video, 19 hours of audio, codebases with 60,000 lines of code, or 2,000 pages of text. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-pro save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 2,097,152 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 7,200 Maximum video length 2 hours Maximum audio length Approximately 19 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-pro-latest Latest stable: gemini-1.5-pro Stable: gemini-1.5-pro-001 gemini-1.5-pro-002 calendar_month Latest update September 2024 Imagen 4 Imagen 4 is our latest image model, capable of generating highly detailed images with rich lighting, significantly better text rendering, and higher resolution output than previous models. Model details Property Description id_card Model code Gemini API imagen-4.0-generate-preview-06-06 imagen-4.0-ultra-generate-preview-06-06 save Supported data types Input Text Output Images token_auto Token limits [*] Input token limit 480 tokens (text) Output images 1 (Ultra) 1 to 4 (Standard) calendar_month Latest update June 2025 Imagen 3 Imagen 3 is our highest quality text-to-image model, capable of generating images with even better detail, richer lighting and fewer distracting artifacts than our previous models. Model details Property Description id_card Model code Gemini API imagen-3.0-generate-002 save Supported data types Input Text Output Images token_auto Token limits [*] Input token limit N/A Output images Up to 4 calendar_month Latest update February 2025 Veo 2 Veo 2 is our high quality text- and image-to-video model, capable of generating detailed videos, capturing the artistic nuance in your prompts. Model details Property Description id_card Model code Gemini API veo-2.0-generate-001 save Supported data types Input Text, image Output Video token_auto Limits Text input N/A Image input Any image resolution and aspect ratio up to 20MB file size Output video Up to 2 calendar_month Latest update April 2025 Gemini 2.5 Flash Live The Gemini 2.5 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-live-2.5-flash-preview save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-live-2.5-flash-preview calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.0 Flash Live The Gemini 2.0 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-live-001 save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-live-001 calendar_month Latest update April 2025 cognition_2 Knowledge cutoff August 2024 Gemini Embedding Experimental Gemini embedding achieves a SOTA performance across many key dimensions including code, multi-lingual, and retrieval. Gemini Embedding rate limits are more restricted since it is an experimental model. Model details Property Description id_card Model code Gemini API gemini-embedding-exp-03-07 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 8,192 Output dimension size Elastic, supports: 3072, 1536, or 768 calendar_month Latest update March 2025 Text Embedding and Embedding Text Embedding Try our new experimental Gemini embedding model which achieves state-of-the-art performance. Text embeddings are used to measure the relatedness of strings and are widely used in many AI applications. text-embedding-004 achieves a stronger retrieval performance and outperforms existing models with comparable dimensions, on the standard MTEB embedding benchmarks. Model details Property Description id_card Model code Gemini API models/text-embedding-004 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 2,048 Output dimension size 768 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Not supported calendar_month Latest update April 2024 Embedding Note: Text Embedding is the newer version of the Embedding model. If you're creating a new project, use Text Embedding. You can use the Embedding model to generate text embeddings for input text. The Embedding model is optimized for creating embeddings with 768 dimensions for text of up to 2,048 tokens. Embedding model details Property Description id_card Model code models/embedding-001 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 2,048 Output dimension size 768 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Not supported calendar_month Latest update December 2023 AQA You can use the AQA model to perform Attributed Question-Answering (AQA)–related tasks over a document, corpus, or a set of passages. The AQA model returns answers to questions that are grounded in provided sources, along with estimating answerable probability. Model details Property Description id_card Model code models/aqa save Supported data types Input Text Output Text language Supported language English token_auto Token limits [*] Input token limit 7,168 Output token limit 1,024 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Supported calendar_month Latest update December 2023 See the examples to explore the capabilities of these model variations. [*] A token is equivalent to about 4 characters for Gemini models. 100 tokens are about 60-80 English words. Model version name patterns Gemini models are available in either stable , preview , or experimental versions. In your code, you can use one of the following model name formats to specify which model and version you want to use. Latest stable Points to the most recent stable version released for the specified model generation and variation. To specify the latest stable version, use the following pattern: <model>-<generation>-<variation> . For example, gemini-2.0-flash . Stable Points to a specific stable model. Stable models usually don't change. Most production apps should use a specific stable model. To specify a stable version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-flash-001 . Preview Points to a preview model which may not be suitable for production use, come with more restrictive rate limits, but may have billing enabled. To specify a preview version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.5-pro-preview-06-05 . Experimental Points to an experimental model which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback and get our latest updates into the hands of developers quickly. To specify an experimental version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-pro-exp-02-05 . Experimental models In addition to stable models, the Gemini API offers experimental models which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback, get our latest updates into the hands of developers quickly, and highlight the pace of innovation happening at Google. What we learn from experimental launches informs how we release models more widely. An experimental model can be swapped for another without prior notice. We don't guarantee that an experimental model will become a stable model in the future. Previous experimental models As new versions or stable releases become available, we remove and replace experimental models. You can find the previous experimental models we released in the following section along with the replacement version: Model code Base model Replacement version gemini-2.5-flash-preview-04-17 Gemini 2.5 Flash gemini-2.5-flash-preview-05-20 gemini-2.0-flash-exp-image-generation Gemini 2.0 Flash gemini-2.0-flash-preview-image-generation gemini-2.5-pro-preview-06-05 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-05-06 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-03-25 Gemini 2.5 Pro gemini-2.5-pro gemini-2.0-flash-thinking-exp-01-21 Gemini 2.5 Flash gemini-2.5-flash-preview-04-17 gemini-2.0-pro-exp-02-05 Gemini 2.0 Pro Experimental gemini-2.5-pro-preview-03-25 gemini-2.0-flash-exp Gemini 2.0 Flash gemini-2.0-flash gemini-exp-1206 Gemini 2.0 Pro gemini-2.0-pro-exp-02-05 gemini-2.0-flash-thinking-exp-1219 Gemini 2.0 Flash Thinking gemini-2.0-flash-thinking-exp-01-21 gemini-exp-1121 Gemini gemini-exp-1206 gemini-exp-1114 Gemini gemini-exp-1206 gemini-1.5-pro-exp-0827 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-pro-exp-0801 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-flash-8b-exp-0924 Gemini 1.5 Flash-8B gemini-1.5-flash-8b gemini-1.5-flash-8b-exp-0827 Gemini 1.5 Flash-8B gemini-1.5-flash-8b Supported languages Gemini models are trained to work with the following languages: Arabic ( ar ) Bengali ( bn ) Bulgarian ( bg ) Chinese simplified and traditional ( zh ) Croatian ( hr ) Czech ( cs ) Danish ( da ) Dutch ( nl ) English ( en ) Estonian ( et ) Finnish ( fi ) French ( fr ) German ( de ) Greek ( el ) Hebrew ( iw ) Hindi ( hi ) Hungarian ( hu ) Indonesian ( id ) Italian ( it ) Japanese ( ja ) Korean ( ko ) Latvian ( lv ) Lithuanian ( lt ) Norwegian ( no ) Polish ( pl ) Portuguese ( pt ) Romanian ( ro ) Russian ( ru ) Serbian ( sr ) Slovak ( sk ) Slovenian ( sl ) Spanish ( es ) Swahili ( sw ) Swedish ( sv ) Thai ( th ) Turkish ( tr ) Ukrainian ( uk ) Vietnamese ( vi ) Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-07 UTC.
|
text_content/docs_models_d8a3b996.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/models#imagen-4
|
| 2 |
+
Title: Gemini models | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Gemini models | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Gemini models 2.5 Pro spark Our most powerful thinking model with maximum response accuracy and state-of-the-art performance Input audio, images, video, and text, get text responses Tackle difficult problems, analyze large databases, and more Best for complex coding, reasoning, and multimodal understanding 2.5 Flash spark Our best model in terms of price-performance, offering well-rounded capabilities. Input audio, images, video, and text, and get text responses Model thinks as needed; or, you can configure a thinking budget Best for low latency, high volume tasks that require thinking 2.5 Flash-Lite experiment A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Input audio, images, video, and text, and get text responses Most cost-efficient model supporting high throughput Best for real time, low latency use cases Note: Gemini 2.5 Pro and 2.5 Flash come with thinking on by default . If you're migrating from a non-thinking model such as 2.0 Pro or Flash, we recommend you to review the Thinking guide first. Model variants The Gemini API offers different models that are optimized for specific use cases. Here's a brief overview of Gemini variants that are available: Model variant Input(s) Output Optimized for Gemini 2.5 Pro gemini-2.5-pro Audio, images, videos, text, and PDF Text Enhanced thinking and reasoning, multimodal understanding, advanced coding, and more Gemini 2.5 Flash gemini-2.5-flash Audio, images, videos, and text Text Adaptive thinking, cost efficiency Gemini 2.5 Flash-Lite Preview gemini-2.5-flash-lite-preview-06-17 Text, image, video, audio Text Most cost-efficient model supporting high throughput Gemini 2.5 Flash Native Audio gemini-2.5-flash-preview-native-audio-dialog & gemini-2.5-flash-exp-native-audio-thinking-dialog Audio, videos, and text Text and audio, interleaved High quality, natural conversational audio outputs, with or without thinking Gemini 2.5 Flash Preview TTS gemini-2.5-flash-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.5 Pro Preview TTS gemini-2.5-pro-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.0 Flash gemini-2.0-flash Audio, images, videos, and text Text Next generation features, speed, and realtime streaming. Gemini 2.0 Flash Preview Image Generation gemini-2.0-flash-preview-image-generation Audio, images, videos, and text Text, images Conversational image generation and editing Gemini 2.0 Flash-Lite gemini-2.0-flash-lite Audio, images, videos, and text Text Cost efficiency and low latency Gemini 1.5 Flash gemini-1.5-flash Audio, images, videos, and text Text Fast and versatile performance across a diverse variety of tasks Gemini 1.5 Flash-8B gemini-1.5-flash-8b Audio, images, videos, and text Text High volume and lower intelligence tasks Gemini 1.5 Pro gemini-1.5-pro Audio, images, videos, and text Text Complex reasoning tasks requiring more intelligence Gemini Embedding gemini-embedding-exp Text Text embeddings Measuring the relatedness of text strings Imagen 4 imagen-4.0-generate-preview-06-06 imagen-4.0-ultra-generate-preview-06-06 Text Images Our most up-to-date image generation model Imagen 3 imagen-3.0-generate-002 Text Images High quality image generation model Veo 2 veo-2.0-generate-001 Text, images Video High quality video generation Gemini 2.5 Flash Live gemini-live-2.5-flash-preview Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions Gemini 2.0 Flash Live gemini-2.0-flash-live-001 Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions You can view the rate limits for each model on the rate limits page . Gemini 2.5 Pro Gemini 2.5 Pro is our state-of-the-art thinking model, capable of reasoning over complex problems in code, math, and STEM, as well as analyzing large datasets, codebases, and documents using long context. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-pro save Supported data types Inputs Audio, images, video, text, and PDF Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported Batch API Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-pro Preview: gemini-2.5-pro-preview-06-05 Preview: gemini-2.5-pro-preview-05-06 Preview: gemini-2.5-pro-preview-03-25 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Our best model in terms of price-performance, offering well-rounded capabilities. 2.5 Flash is best for large scale processing, low-latency, high volume tasks that require thinking, and agentic use cases. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash save Supported data types Inputs Text, images, video, audio Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Audio generation Not supported Caching Supported Code execution Supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Supported Thinking Supported Tuning Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-flash Preview: gemini-2.5-flash-preview-05-20 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash-Lite Preview A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-lite-preview-06-17 save Supported data types Inputs Text, images, video, and audio Output Text token_auto Token limits [*] Input token limit 1,000,000 Output token limit 64,000 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported URL Context Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-lite-preview-06-17 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Native Audio Our native audio dialog models, with and without thinking, available through the Live API . These models provide interactive and unstructured conversational experiences, with style and control prompting. Try native audio in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-preview-native-audio-dialog & models/gemini-2.5-flash-exp-native-audio-thinking-dialog save Supported data types Inputs Audio, video, text Output Audio and text token_auto Token limits [*] Input token limit 128,000 Output token limit 8,000 handyman Capabilities Audio generation Supported Caching Not supported Code execution Not supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Not supported Thinking Supported Tuning Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-preview-05-20 Experimental: gemini-2.5-flash-exp-native-audio-thinking-dialog calendar_month Latest update May 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Preview Text-to-Speech Gemini 2.5 Flash Preview TTS is our price-performant text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Flash rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported 123 Versions Read the model version patterns for more details. gemini-2.5-flash-preview-tts calendar_month Latest update May 2025 Gemini 2.5 Pro Preview Text-to-Speech Gemini 2.5 Pro Preview TTS is our most powerful text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Pro rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-pro-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported 123 Versions Read the model version patterns for more details. gemini-2.5-pro-preview-tts calendar_month Latest update May 2025 Gemini 2.0 Flash Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, native tool use, and a 1M token context window. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Not supported Live API Supported Thinking Experimental Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash Stable: gemini-2.0-flash-001 Experimental: gemini-2.0-flash-exp calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash Preview Image Generation Gemini 2.0 Flash Preview Image Generation delivers improved image generation features, including generating and editing images conversationally. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-preview-image-generation save Supported data types Inputs Audio, images, video, and text Output Text and images token_auto Token limits [*] Input token limit 32,000 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Not supported Code execution Not Supported Search Not Supported Image generation Supported Audio generation Not supported Live API Not Supported Thinking Not Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-preview-image-generation gemini-2.0-flash-preview-image-generation is not currently supported in a number of countries in Europe, Middle East & Africa calendar_month Latest update May 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash-Lite A Gemini 2.0 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-lite save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Not supported Search Not supported Image generation Not supported Audio generation Not supported Live API Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash-lite Stable: gemini-2.0-flash-lite-001 calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 1.5 Flash Gemini 1.5 Flash is a fast and versatile multimodal model for scaling across diverse tasks. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-latest Latest stable: gemini-1.5-flash Stable: gemini-1.5-flash-001 gemini-1.5-flash-002 calendar_month Latest update September 2024 Gemini 1.5 Flash-8B Gemini 1.5 Flash-8B is a small model designed for lower intelligence tasks. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-flash-8b save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-8b-latest Latest stable: gemini-1.5-flash-8b Stable: gemini-1.5-flash-8b-001 calendar_month Latest update October 2024 Gemini 1.5 Pro Try Gemini 2.5 Pro Preview , our most advanced Gemini model to date. Gemini 1.5 Pro is a mid-size multimodal model that is optimized for a wide-range of reasoning tasks. 1.5 Pro can process large amounts of data at once, including 2 hours of video, 19 hours of audio, codebases with 60,000 lines of code, or 2,000 pages of text. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-pro save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 2,097,152 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 7,200 Maximum video length 2 hours Maximum audio length Approximately 19 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-pro-latest Latest stable: gemini-1.5-pro Stable: gemini-1.5-pro-001 gemini-1.5-pro-002 calendar_month Latest update September 2024 Imagen 4 Imagen 4 is our latest image model, capable of generating highly detailed images with rich lighting, significantly better text rendering, and higher resolution output than previous models. Model details Property Description id_card Model code Gemini API imagen-4.0-generate-preview-06-06 imagen-4.0-ultra-generate-preview-06-06 save Supported data types Input Text Output Images token_auto Token limits [*] Input token limit 480 tokens (text) Output images 1 (Ultra) 1 to 4 (Standard) calendar_month Latest update June 2025 Imagen 3 Imagen 3 is our highest quality text-to-image model, capable of generating images with even better detail, richer lighting and fewer distracting artifacts than our previous models. Model details Property Description id_card Model code Gemini API imagen-3.0-generate-002 save Supported data types Input Text Output Images token_auto Token limits [*] Input token limit N/A Output images Up to 4 calendar_month Latest update February 2025 Veo 2 Veo 2 is our high quality text- and image-to-video model, capable of generating detailed videos, capturing the artistic nuance in your prompts. Model details Property Description id_card Model code Gemini API veo-2.0-generate-001 save Supported data types Input Text, image Output Video token_auto Limits Text input N/A Image input Any image resolution and aspect ratio up to 20MB file size Output video Up to 2 calendar_month Latest update April 2025 Gemini 2.5 Flash Live The Gemini 2.5 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-live-2.5-flash-preview save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-live-2.5-flash-preview calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.0 Flash Live The Gemini 2.0 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-live-001 save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-live-001 calendar_month Latest update April 2025 cognition_2 Knowledge cutoff August 2024 Gemini Embedding Experimental Gemini embedding achieves a SOTA performance across many key dimensions including code, multi-lingual, and retrieval. Gemini Embedding rate limits are more restricted since it is an experimental model. Model details Property Description id_card Model code Gemini API gemini-embedding-exp-03-07 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 8,192 Output dimension size Elastic, supports: 3072, 1536, or 768 calendar_month Latest update March 2025 Text Embedding and Embedding Text Embedding Try our new experimental Gemini embedding model which achieves state-of-the-art performance. Text embeddings are used to measure the relatedness of strings and are widely used in many AI applications. text-embedding-004 achieves a stronger retrieval performance and outperforms existing models with comparable dimensions, on the standard MTEB embedding benchmarks. Model details Property Description id_card Model code Gemini API models/text-embedding-004 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 2,048 Output dimension size 768 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Not supported calendar_month Latest update April 2024 Embedding Note: Text Embedding is the newer version of the Embedding model. If you're creating a new project, use Text Embedding. You can use the Embedding model to generate text embeddings for input text. The Embedding model is optimized for creating embeddings with 768 dimensions for text of up to 2,048 tokens. Embedding model details Property Description id_card Model code models/embedding-001 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 2,048 Output dimension size 768 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Not supported calendar_month Latest update December 2023 AQA You can use the AQA model to perform Attributed Question-Answering (AQA)–related tasks over a document, corpus, or a set of passages. The AQA model returns answers to questions that are grounded in provided sources, along with estimating answerable probability. Model details Property Description id_card Model code models/aqa save Supported data types Input Text Output Text language Supported language English token_auto Token limits [*] Input token limit 7,168 Output token limit 1,024 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Supported calendar_month Latest update December 2023 See the examples to explore the capabilities of these model variations. [*] A token is equivalent to about 4 characters for Gemini models. 100 tokens are about 60-80 English words. Model version name patterns Gemini models are available in either stable , preview , or experimental versions. In your code, you can use one of the following model name formats to specify which model and version you want to use. Latest stable Points to the most recent stable version released for the specified model generation and variation. To specify the latest stable version, use the following pattern: <model>-<generation>-<variation> . For example, gemini-2.0-flash . Stable Points to a specific stable model. Stable models usually don't change. Most production apps should use a specific stable model. To specify a stable version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-flash-001 . Preview Points to a preview model which may not be suitable for production use, come with more restrictive rate limits, but may have billing enabled. To specify a preview version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.5-pro-preview-06-05 . Experimental Points to an experimental model which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback and get our latest updates into the hands of developers quickly. To specify an experimental version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-pro-exp-02-05 . Experimental models In addition to stable models, the Gemini API offers experimental models which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback, get our latest updates into the hands of developers quickly, and highlight the pace of innovation happening at Google. What we learn from experimental launches informs how we release models more widely. An experimental model can be swapped for another without prior notice. We don't guarantee that an experimental model will become a stable model in the future. Previous experimental models As new versions or stable releases become available, we remove and replace experimental models. You can find the previous experimental models we released in the following section along with the replacement version: Model code Base model Replacement version gemini-2.5-flash-preview-04-17 Gemini 2.5 Flash gemini-2.5-flash-preview-05-20 gemini-2.0-flash-exp-image-generation Gemini 2.0 Flash gemini-2.0-flash-preview-image-generation gemini-2.5-pro-preview-06-05 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-05-06 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-03-25 Gemini 2.5 Pro gemini-2.5-pro gemini-2.0-flash-thinking-exp-01-21 Gemini 2.5 Flash gemini-2.5-flash-preview-04-17 gemini-2.0-pro-exp-02-05 Gemini 2.0 Pro Experimental gemini-2.5-pro-preview-03-25 gemini-2.0-flash-exp Gemini 2.0 Flash gemini-2.0-flash gemini-exp-1206 Gemini 2.0 Pro gemini-2.0-pro-exp-02-05 gemini-2.0-flash-thinking-exp-1219 Gemini 2.0 Flash Thinking gemini-2.0-flash-thinking-exp-01-21 gemini-exp-1121 Gemini gemini-exp-1206 gemini-exp-1114 Gemini gemini-exp-1206 gemini-1.5-pro-exp-0827 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-pro-exp-0801 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-flash-8b-exp-0924 Gemini 1.5 Flash-8B gemini-1.5-flash-8b gemini-1.5-flash-8b-exp-0827 Gemini 1.5 Flash-8B gemini-1.5-flash-8b Supported languages Gemini models are trained to work with the following languages: Arabic ( ar ) Bengali ( bn ) Bulgarian ( bg ) Chinese simplified and traditional ( zh ) Croatian ( hr ) Czech ( cs ) Danish ( da ) Dutch ( nl ) English ( en ) Estonian ( et ) Finnish ( fi ) French ( fr ) German ( de ) Greek ( el ) Hebrew ( iw ) Hindi ( hi ) Hungarian ( hu ) Indonesian ( id ) Italian ( it ) Japanese ( ja ) Korean ( ko ) Latvian ( lv ) Lithuanian ( lt ) Norwegian ( no ) Polish ( pl ) Portuguese ( pt ) Romanian ( ro ) Russian ( ru ) Serbian ( sr ) Slovak ( sk ) Slovenian ( sl ) Spanish ( es ) Swahili ( sw ) Swedish ( sv ) Thai ( th ) Turkish ( tr ) Ukrainian ( uk ) Vietnamese ( vi ) Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-07 UTC.
|
text_content/docs_openai_81698f40.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/openai
|
| 2 |
+
Title: OpenAI compatibility | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
OpenAI compatibility | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback OpenAI compatibility Gemini models are accessible using the OpenAI libraries (Python and TypeScript / Javascript) along with the REST API, by updating three lines of code and using your Gemini API key . If you aren't already using the OpenAI libraries, we recommend that you call the Gemini API directly . Python from openai import OpenAI client = OpenAI ( api_key = "GEMINI_API_KEY" , base_url = "https://generativelanguage.googleapis.com/v1beta/openai/" ) response = client . chat . completions . create ( model = "gemini-2.5-flash" , messages = [ { "role" : "system" , "content" : "You are a helpful assistant." }, { "role" : "user" , "content" : "Explain to me how AI works" } ] ) print ( response . choices [ 0 ] . message ) JavaScript import OpenAI from "openai" ; const openai = new OpenAI ({ apiKey : "GEMINI_API_KEY" , baseURL : "https://generativelanguage.googleapis.com/v1beta/openai/" }); const response = await openai . chat . completions . create ({ model : "gemini-2.0-flash" , messages : [ { role : "system" , content : "You are a helpful assistant." }, { role : "user" , content : "Explain to me how AI works" , }, ], }); console . log ( response . choices [ 0 ]. message ); REST curl "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer GEMINI_API_KEY" \ -d '{ "model": "gemini-2.0-flash", "messages": [ {"role": "user", "content": "Explain to me how AI works"} ] }' What changed? Just three lines! api_key="GEMINI_API_KEY" : Replace " GEMINI_API_KEY " with your actual Gemini API key, which you can get in Google AI Studio . base_url="https://generativelanguage.googleapis.com/v1beta/openai/" : This tells the OpenAI library to send requests to the Gemini API endpoint instead of the default URL. model="gemini-2.0-flash" : Choose a compatible Gemini model Thinking Gemini 2.5 models are trained to think through complex problems, leading to significantly improved reasoning. The Gemini API comes with a "thinking budget" parameter which gives fine grain control over how much the model will think. Unlike the Gemini API, the OpenAI API offers three levels of thinking control: "low" , "medium" , and "high" , which map to 1,024, 8,192, and 24,576 tokens, respectively. If you want to disable thinking, you can set reasoning_effort to "none" (note that reasoning cannot be turned off for 2.5 Pro models). Python from openai import OpenAI client = OpenAI ( api_key = "GEMINI_API_KEY" , base_url = "https://generativelanguage.googleapis.com/v1beta/openai/" ) response = client . chat . completions . create ( model = "gemini-2.5-flash" , reasoning_effort = "low" , messages = [ { "role" : "system" , "content" : "You are a helpful assistant." }, { "role" : "user" , "content" : "Explain to me how AI works" } ] ) print ( response . choices [ 0 ] . message ) JavaScript import OpenAI from "openai" ; const openai = new OpenAI ({ apiKey : "GEMINI_API_KEY" , baseURL : "https://generativelanguage.googleapis.com/v1beta/openai/" }); const response = await openai . chat . completions . create ({ model : "gemini-2.5-flash" , reasoning_effort : "low" , messages : [ { role : "system" , content : "You are a helpful assistant." }, { role : "user" , content : "Explain to me how AI works" , }, ], }); console . log ( response . choices [ 0 ]. message ); REST curl "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer GEMINI_API_KEY" \ -d '{ "model": "gemini-2.5-flash", "reasoning_effort": "low", "messages": [ {"role": "user", "content": "Explain to me how AI works"} ] }' Gemini thinking models also produce thought summaries and can use exact thinking budgets . You can use the extra_body field to include these fields in your request. Note that reasoning_effort and thinking_budget overlap functionality, so they can't be used at the same time. Python from openai import OpenAI client = OpenAI ( api_key = "GEMINI_API_KEY" , base_url = "https://generativelanguage.googleapis.com/v1beta/openai/" ) response = client . chat . completions . create ( model = "gemini-2.5-flash" , messages = [{ "role" : "user" , "content" : "Explain to me how AI works" }], extra_body = { 'extra_body' : { "google" : { "thinking_config" : { "thinking_budget" : 800 , "include_thoughts" : True } } } } ) print ( response . choices [ 0 ] . message ) JavaScript import OpenAI from "openai" ; const openai = new OpenAI ({ apiKey : "GEMINI_API_KEY" , baseURL : "https://generativelanguage.googleapis.com/v1beta/openai/" }); const response = await openai . chat . completions . create ({ model : "gemini-2.5-flash" , messages : [{ role : "user" , content : "Explain to me how AI works" ,}], extra_body : { "google" : { "thinking_config" : { "thinking_budget" : 800 , "include_thoughts" : true } } } }); console . log ( response . choices [ 0 ]. message ); REST curl "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer GEMINI_API_KEY" \ -d '{ "model": "gemini-2.5-flash", "messages": [{"role": "user", "content": "Explain to me how AI works"}], "extra_body": { "google": { "thinking_config": { "include_thoughts": true } } } }' Streaming The Gemini API supports streaming responses . Python from openai import OpenAI client = OpenAI ( api_key = "GEMINI_API_KEY" , base_url = "https://generativelanguage.googleapis.com/v1beta/openai/" ) response = client . chat . completions . create ( model = "gemini-2.0-flash" , messages = [ { "role" : "system" , "content" : "You are a helpful assistant." }, { "role" : "user" , "content" : "Hello!" } ], stream = True ) for chunk in response : print ( chunk . choices [ 0 ] . delta ) JavaScript import OpenAI from "openai" ; const openai = new OpenAI ({ apiKey : "GEMINI_API_KEY" , baseURL : "https://generativelanguage.googleapis.com/v1beta/openai/" }); async function main () { const completion = await openai . chat . completions . create ({ model : "gemini-2.0-flash" , messages : [ { "role" : "system" , "content" : "You are a helpful assistant." }, { "role" : "user" , "content" : "Hello!" } ], stream : true , }); for await ( const chunk of completion ) { console . log ( chunk . choices [ 0 ]. delta . content ); } } main (); REST curl "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer GEMINI_API_KEY" \ -d '{ "model": "gemini-2.0-flash", "messages": [ {"role": "user", "content": "Explain to me how AI works"} ], "stream": true }' Function calling Function calling makes it easier for you to get structured data outputs from generative models and is supported in the Gemini API . Python from openai import OpenAI client = OpenAI ( api_key = "GEMINI_API_KEY" , base_url = "https://generativelanguage.googleapis.com/v1beta/openai/" ) tools = [ { "type" : "function" , "function" : { "name" : "get_weather" , "description" : "Get the weather in a given location" , "parameters" : { "type" : "object" , "properties" : { "location" : { "type" : "string" , "description" : "The city and state, e.g. Chicago, IL" , }, "unit" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ]}, }, "required" : [ "location" ], }, } } ] messages = [{ "role" : "user" , "content" : "What's the weather like in Chicago today?" }] response = client . chat . completions . create ( model = "gemini-2.0-flash" , messages = messages , tools = tools , tool_choice = "auto" ) print ( response ) JavaScript import OpenAI from "openai" ; const openai = new OpenAI ({ apiKey : "GEMINI_API_KEY" , baseURL : "https://generativelanguage.googleapis.com/v1beta/openai/" }); async function main () { const messages = [{ "role" : "user" , "content" : "What's the weather like in Chicago today?" }]; const tools = [ { "type" : "function" , "function" : { "name" : "get_weather" , "description" : "Get the weather in a given location" , "parameters" : { "type" : "object" , "properties" : { "location" : { "type" : "string" , "description" : "The city and state, e.g. Chicago, IL" , }, "unit" : { "type" : "string" , "enum" : [ "celsius" , "fahrenheit" ]}, }, "required" : [ "location" ], }, } } ]; const response = await openai . chat . completions . create ({ model : "gemini-2.0-flash" , messages : messages , tools : tools , tool_choice : "auto" , }); console . log ( response ); } main (); REST curl "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer GEMINI_API_KEY" \ -d '{ "model": "gemini-2.0-flash", "messages": [ { "role": "user", "content": "What' \' 's the weather like in Chicago today?" } ], "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. Chicago, IL" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] } }, "required": ["location"] } } } ], "tool_choice": "auto" }' Image understanding Gemini models are natively multimodal and provide best in class performance on many common vision tasks . Python import base64 from openai import OpenAI client = OpenAI ( api_key = "GEMINI_API_KEY" , base_url = "https://generativelanguage.googleapis.com/v1beta/openai/" ) # Function to encode the image def encode_image ( image_path ): with open ( image_path , "rb" ) as image_file : return base64 . b64encode ( image_file . read ()) . decode ( 'utf-8' ) # Getting the base64 string base64_image = encode_image ( "Path/to/agi/image.jpeg" ) response = client . chat . completions . create ( model = "gemini-2.0-flash" , messages = [ { "role" : "user" , "content" : [ { "type" : "text" , "text" : "What is in this image?" , }, { "type" : "image_url" , "image_url" : { "url" : f "data:image/jpeg;base64, { base64_image } " }, }, ], } ], ) print ( response . choices [ 0 ]) JavaScript import OpenAI from "openai" ; import fs from 'fs/promises' ; const openai = new OpenAI ({ apiKey : "GEMINI_API_KEY" , baseURL : "https://generativelanguage.googleapis.com/v1beta/openai/" }); async function encodeImage ( imagePath ) { try { const imageBuffer = await fs . readFile ( imagePath ); return imageBuffer . toString ( 'base64' ); } catch ( error ) { console . error ( "Error encoding image:" , error ); return null ; } } async function main () { const imagePath = "Path/to/agi/image.jpeg" ; const base64Image = await encodeImage ( imagePath ); const messages = [ { "role" : "user" , "content" : [ { "type" : "text" , "text" : "What is in this image?" , }, { "type" : "image_url" , "image_url" : { "url" : `data:image/jpeg;base64, ${ base64Image } ` }, }, ], } ]; try { const response = await openai . chat . completions . create ({ model : "gemini-2.0-flash" , messages : messages , }); console . log ( response . choices [ 0 ]); } catch ( error ) { console . error ( "Error calling Gemini API:" , error ); } } main (); REST bash -c ' base64_image=$(base64 -i "Path/to/agi/image.jpeg"); curl "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer GEMINI_API_KEY" \ -d "{ \"model\": \"gemini-2.0-flash\", \"messages\": [ { \"role\": \"user\", \"content\": [ { \"type\": \"text\", \"text\": \"What is in this image?\" }, { \"type\": \"image_url\", \"image_url\": { \"url\": \"data:image/jpeg;base64,${base64_image}\" } } ] } ] }" ' Generate an image Note: Image generation is only available in the paid tier. Generate an image: Python import base64 from openai import OpenAI from PIL import Image from io import BytesIO client = OpenAI ( api_key = "GEMINI_API_KEY" , base_url = "https://generativelanguage.googleapis.com/v1beta/openai/" , ) response = client . images . generate ( model = "imagen-3.0-generate-002" , prompt = "a portrait of a sheepadoodle wearing a cape" , response_format = 'b64_json' , n = 1 , ) for image_data in response . data : image = Image . open ( BytesIO ( base64 . b64decode ( image_data . b64_json ))) image . show () JavaScript import OpenAI from "openai" ; const openai = new OpenAI ({ apiKey : "GEMINI_API_KEY" , baseURL : "https://generativelanguage.googleapis.com/v1beta/openai/" , }); async function main () { const image = await openai . images . generate ( { model : "imagen-3.0-generate-002" , prompt : "a portrait of a sheepadoodle wearing a cape" , response_format : "b64_json" , n : 1 , } ); console . log ( image . data ); } main (); REST curl "https://generativelanguage.googleapis.com/v1beta/openai/images/generations" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer GEMINI_API_KEY" \ -d '{ "model": "imagen-3.0-generate-002", "prompt": "a portrait of a sheepadoodle wearing a cape", "response_format": "b64_json", "n": 1, }' Audio understanding Analyze audio input: Python import base64 from openai import OpenAI client = OpenAI ( api_key = "GEMINI_API_KEY" , base_url = "https://generativelanguage.googleapis.com/v1beta/openai/" ) with open ( "/path/to/your/audio/file.wav" , "rb" ) as audio_file : base64_audio = base64 . b64encode ( audio_file . read ()) . decode ( 'utf-8' ) response = client . chat . completions . create ( model = "gemini-2.0-flash" , messages = [ { "role" : "user" , "content" : [ { "type" : "text" , "text" : "Transcribe this audio" , }, { "type" : "input_audio" , "input_audio" : { "data" : base64_audio , "format" : "wav" } } ], } ], ) print ( response . choices [ 0 ] . message . content ) JavaScript import fs from "fs" ; import OpenAI from "openai" ; const client = new OpenAI ({ apiKey : "GEMINI_API_KEY" , baseURL : "https://generativelanguage.googleapis.com/v1beta/openai/" , }); const audioFile = fs . readFileSync ( "/path/to/your/audio/file.wav" ); const base64Audio = Buffer . from ( audioFile ). toString ( "base64" ); async function main () { const response = await client . chat . completions . create ({ model : "gemini-2.0-flash" , messages : [ { role : "user" , content : [ { type : "text" , text : "Transcribe this audio" , }, { type : "input_audio" , input_audio : { data : base64Audio , format : "wav" , }, }, ], }, ], }); console . log ( response . choices [ 0 ]. message . content ); } main (); REST Note: If you get an Argument list too long error, the encoding of your audio file might be too long for curl. bash -c ' base64_audio=$(base64 -i "/path/to/your/audio/file.wav"); curl "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer GEMINI_API_KEY" \ -d "{ \"model\": \"gemini-2.0-flash\", \"messages\": [ { \"role\": \"user\", \"content\": [ { \"type\": \"text\", \"text\": \"Transcribe this audio file.\" }, { \"type\": \"input_audio\", \"input_audio\": { \"data\": \"${base64_audio}\", \"format\": \"wav\" } } ] } ] }" ' Structured output Gemini models can output JSON objects in any structure you define . Python from pydantic import BaseModel from openai import OpenAI client = OpenAI ( api_key = "GEMINI_API_KEY" , base_url = "https://generativelanguage.googleapis.com/v1beta/openai/" ) class CalendarEvent ( BaseModel ): name : str date : str participants : list [ str ] completion = client . beta . chat . completions . parse ( model = "gemini-2.0-flash" , messages = [ { "role" : "system" , "content" : "Extract the event information." }, { "role" : "user" , "content" : "John and Susan are going to an AI conference on Friday." }, ], response_format = CalendarEvent , ) print ( completion . choices [ 0 ] . message . parsed ) JavaScript import OpenAI from "openai" ; import { zodResponseFormat } from "openai/helpers/zod" ; import { z } from "zod" ; const openai = new OpenAI ({ apiKey : "GEMINI_API_KEY" , baseURL : "https://generativelanguage.googleapis.com/v1beta/openai" }); const CalendarEvent = z . object ({ name : z . string (), date : z . string (), participants : z . array ( z . string ()), }); const completion = await openai . beta . chat . completions . parse ({ model : "gemini-2.0-flash" , messages : [ { role : "system" , content : "Extract the event information." }, { role : "user" , content : "John and Susan are going to an AI conference on Friday" }, ], response_format : zodResponseFormat ( CalendarEvent , "event" ), }); const event = completion . choices [ 0 ]. message . parsed ; console . log ( event ); Embeddings Text embeddings measure the relatedness of text strings and can be generated using the Gemini API . Python from openai import OpenAI client = OpenAI ( api_key = "GEMINI_API_KEY" , base_url = "https://generativelanguage.googleapis.com/v1beta/openai/" ) response = client . embeddings . create ( input = "Your text string goes here" , model = "text-embedding-004" ) print ( response . data [ 0 ] . embedding ) JavaScript import OpenAI from "openai" ; const openai = new OpenAI ({ apiKey : "GEMINI_API_KEY" , baseURL : "https://generativelanguage.googleapis.com/v1beta/openai/" }); async function main () { const embedding = await openai . embeddings . create ({ model : "text-embedding-004" , input : "Your text string goes here" , }); console . log ( embedding ); } main (); REST curl "https://generativelanguage.googleapis.com/v1beta/openai/embeddings" \ -H "Content-Type: application/json" \ -H "Authorization: Bearer GEMINI_API_KEY" \ -d '{ "input": "Your text string goes here", "model": "text-embedding-004" }' extra_body There are several features supported by Gemini that are not available in OpenAI models but can be enabled using the extra_body field. extra_body features safety_settings Corresponds to Gemini's SafetySetting . cached_content Corresponds to Gemini's GenerateContentRequest.cached_content . thinking_config Corresponds to Gemini's ThinkingConfig . cached_content Here's an example of using extra_body to set cached_content : Python from openai import OpenAI client = OpenAI ( api_key = MY_API_KEY , base_url = "https://generativelanguage.googleapis.com/v1beta/" ) stream = client . chat . completions . create ( model = "gemini-2.5-pro" , n = 1 , messages = [ { "role" : "user" , "content" : "Summarize the video" } ], stream = True , stream_options = { 'include_usage' : True }, extra_body = { 'extra_body' : { 'google' : { 'cached_content' : "cachedContents/0000aaaa1111bbbb2222cccc3333dddd4444eeee" } } } ) for chunk in stream : print ( chunk ) print ( chunk . usage . to_dict ()) List models Get a list of available Gemini models: Python from openai import OpenAI client = OpenAI ( api_key = "GEMINI_API_KEY" , base_url = "https://generativelanguage.googleapis.com/v1beta/openai/" ) models = client . models . list () for model in models : print ( model . id ) JavaScript import OpenAI from "openai" ; const openai = new OpenAI ({ apiKey : "GEMINI_API_KEY" , baseURL : "https://generativelanguage.googleapis.com/v1beta/openai/" , }); async function main () { const list = await openai . models . list (); for await ( const model of list ) { console . log ( model ); } } main (); REST curl https://generativelanguage.googleapis.com/v1beta/openai/models \ -H "Authorization: Bearer GEMINI_API_KEY" Retrieve a model Retrieve a Gemini model: Python from openai import OpenAI client = OpenAI ( api_key = "GEMINI_API_KEY" , base_url = "https://generativelanguage.googleapis.com/v1beta/openai/" ) model = client . models . retrieve ( "gemini-2.0-flash" ) print ( model . id ) JavaScript import OpenAI from "openai" ; const openai = new OpenAI ({ apiKey : "GEMINI_API_KEY" , baseURL : "https://generativelanguage.googleapis.com/v1beta/openai/" , }); async function main () { const model = await openai . models . retrieve ( "gemini-2.0-flash" ); console . log ( model . id ); } main (); REST curl https://generativelanguage.googleapis.com/v1beta/openai/models/gemini-2.0-flash \ -H "Authorization: Bearer GEMINI_API_KEY" Current limitations Support for the OpenAI libraries is still in beta while we extend feature support. If you have questions about supported parameters, upcoming features, or run into any issues getting started with Gemini, join our Developer Forum . What's next Try our OpenAI Compatibility Colab to work through more detailed examples. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-18 UTC.
|
text_content/docs_pricing_a0fd1cc1.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/pricing#main-content
|
| 2 |
+
Title: Gemini Developer API Pricing | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Gemini Developer API Pricing | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Gemini Developer API Pricing The Gemini API "free tier" is offered through the API service with lower rate limits for testing purposes. Google AI Studio usage is completely free in all available countries. The Gemini API "paid tier" comes with higher rate limits , additional features, and different data handling. Upgrade to the Paid Tier If you're looking to reduce costs and your use case doesn't require immediate real-time responses, check out Batch Mode . Batch Mode is designed to process large volumes of requests asynchronously. Requests submitted using this mode is 50% of the price of interactive (non-batch mode) requests. Gemini 2.5 Pro Try it in Google AI Studio Our state-of-the-art multipurpose model, which excels at coding and complex reasoning tasks. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $1.25, prompts <= 200k tokens $2.50, prompts > 200k tokens Output price (including thinking tokens) Free of charge $10.00, prompts <= 200k tokens $15.00, prompts > 200k Context caching price Not available $0.31, prompts <= 200k tokens $0.625, prompts > 200k $4.50 / 1,000,000 tokens per hour (storage price) Grounding with Google Search Not available 1,500 RPD (free), then $35 / 1,000 requests Used to improve our products Yes No Gemini 2.5 Flash Try it in Google AI Studio Our first hybrid reasoning model which supports a 1M token context window and has thinking budgets. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $0.30 (text / image / video) $1.00 (audio) Output price (including thinking tokens) Free of charge $2.50 Context caching price Not available $0.075 (text / image / video) $0.25 (audio) $1.00 / 1,000,000 tokens per hour (storage price) Grounding with Google Search Free of charge, up to 500 RPD (limit shared with Flash-Lite RPD) 1,500 RPD (free, limit shared with Flash-Lite RPD), then $35 / 1,000 requests Live API Free of charge Input: $0.50 (text), $3.00 (audio / image [video]) Output: $2.00 (text), $12.00 (audio) Used to improve our products Yes No Gemini 2.5 Flash-Lite Preview Try it in Google AI Studio Our smallest and most cost effective model, built for at scale usage. Preview models may change before becoming stable and have more restrictive rate limits. Free Tier Paid Tier, per 1M tokens in USD Input price (text, image, video) Free of charge $0.10 (text / image / video) $0.50 (audio) Output price (including thinking tokens) Free of charge $0.40 Context caching price Not available $0.025 (text / image / video) $0.125 (audio) $1.00 / 1,000,000 tokens per hour (storage price) Grounding with Google Search Free of charge, up to 500 RPD (limit shared with Flash RPD) 1,500 RPD (free, limit shared with Flash RPD), then $35 / 1,000 requests Used to improve our products Yes No Gemini 2.5 Flash Native Audio Try it in Google AI Studio Our native audio models optimized for higher quality audio outputs with better pacing, voice naturalness, verbosity, and mood. Preview models may change before becoming stable and have more restrictive rate limits. Free Tier Paid Tier, per 1M tokens in USD Input price Not available $0.50 (text) $3.00 (audio / video) Output price (including thinking tokens) Not available $2.00 (text) $12.00 (audio) Used to improve our products Yes No Gemini 2.5 Flash Preview TTS Try it in Google AI Studio Our 2.5 Flash text-to-speech audio model optimized for price-performant, low-latency, controllable speech generation. Preview models may change before becoming stable and have more restrictive rate limits. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $0.50 (text) Output price Free of charge $10.00 (audio) Used to improve our products Yes No Gemini 2.5 Pro Preview TTS Try it in Google AI Studio Our 2.5 Pro text-to-speech audio model optimized for powerful, low-latency speech generation for more natural outputs and easier to steer prompts. Preview models may change before becoming stable and have more restrictive rate limits. Free Tier Paid Tier, per 1M tokens in USD Input price Not available $1.00 (text) Output price Not available $20.00 (audio) Used to improve our products Yes No Gemini 2.0 Flash Try it in Google AI Studio Our most balanced multimodal model with great performance across all tasks, with a 1 million token context window, and built for the era of Agents. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $0.10 (text / image / video) $0.70 (audio) Output price Free of charge $0.40 Context caching price Free of charge $0.025 / 1,000,000 tokens (text/image/video) $0.175 / 1,000,000 tokens (audio) Context caching (storage) Free of charge, up to 1,000,000 tokens of storage per hour $1.00 / 1,000,000 tokens per hour Image generation pricing Free of charge $0.039 per image* Tuning price Not available Not available Grounding with Google Search Free of charge, up to 500 RPD 1,500 RPD (free), then $35 / 1,000 requests Live API Free of charge Input: $0.35 (text), $2.10 (audio / image [video]) Output: $1.50 (text), $8.50 (audio) Used to improve our products Yes No [*] Image output is priced at $30 per 1,000,000 tokens. Output images up to 1024x1024px consume 1290 tokens and are equivalent to $0.039 per image. Gemini 2.0 Flash-Lite Try it in Google AI Studio Our smallest and most cost effective model, built for at scale usage. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $0.075 Output price Free of charge $0.30 Context caching price Not available Not available Context caching (storage) Not available Not available Tuning price Not available Not available Grounding with Google Search Not available Not available Used to improve our products Yes No Imagen 4 Preview Try it in Google AI Studio Our latest image generation model, with significantly better text rendering and better overall image quality. Preview models may change before becoming stable and have more restrictive rate limits. Free Tier Paid Tier, per Image in USD Imagen 4 Standard image price Not available $0.04 Imagen 4 Ultra image price Not available $0.06 Used to improve our products Yes No Imagen 3 Try it in Google AI Studio Our state-of-the-art image generation model, available to developers on the paid tier of the Gemini API. Free Tier Paid Tier, per Image in USD Image price Not available $0.03 Used to improve our products Yes No Veo 2 Try the API Our state-of-the-art video generation model, available to developers on the paid tier of the Gemini API. Free Tier Paid Tier, per second in USD Video price Not available $0.35 Used to improve our products Yes No Gemma 3 Try Gemma 3 Our lightweight, state-of the art, open model built from the same technology that powers our Gemini models. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge Not available Output price Free of charge Not available Context caching price Free of charge Not available Context caching (storage) Free of charge Not available Tuning price Not available Not available Grounding with Google Search Not available Not available Used to improve our products Yes No Gemma 3n Try Gemma 3n Our open model built for efficient performance on everyday devices like mobile phones, laptops, and tablets. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge Not available Output price Free of charge Not available Context caching price Free of charge Not available Context caching (storage) Free of charge Not available Tuning price Not available Not available Grounding with Google Search Not available Not available Used to improve our products Yes No Gemini 1.5 Flash Try it in Google AI Studio Our fastest multimodal model with great performance for diverse, repetitive tasks and a 1 million token context window. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $0.075, prompts <= 128k tokens $0.15, prompts > 128k tokens Output price Free of charge $0.30, prompts <= 128k tokens $0.60, prompts > 128k tokens Context caching price Free of charge, up to 1 million tokens of storage per hour $0.01875, prompts <= 128k tokens $0.0375, prompts > 128k tokens Context caching (storage) Free of charge $1.00 per hour Tuning price Token prices are the same for tuned models Tuning service is free of charge. Token prices are the same for tuned models Tuning service is free of charge. Grounding with Google Search Not available $35 / 1K grounding requests Used to improve our products Yes No Gemini 1.5 Flash-8B Try it in Google AI Studio Our smallest model for lower intelligence use cases, with a 1 million token context window. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $0.0375, prompts <= 128k tokens $0.075, prompts > 128k tokens Output price Free of charge $0.15, prompts <= 128k tokens $0.30, prompts > 128k tokens Context caching price Free of charge, up to 1 million tokens of storage per hour $0.01, prompts <= 128k tokens $0.02, prompts > 128k tokens Context caching (storage) Free of charge $0.25 per hour Tuning price Token prices are the same for tuned models Tuning service is free of charge. Token prices are the same for tuned models Tuning service is free of charge. Grounding with Google Search Not available $35 / 1K grounding requests Used to improve our products Yes No Gemini 1.5 Pro Try it in Google AI Studio Our highest intelligence Gemini 1.5 series model, with a breakthrough 2 million token context window. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $1.25, prompts <= 128k tokens $2.50, prompts > 128k tokens Output price Free of charge $5.00, prompts <= 128k tokens $10.00, prompts > 128k tokens Context caching price Not available $0.3125, prompts <= 128k tokens $0.625, prompts > 128k tokens Context caching (storage) Not available $4.50 per hour Tuning price Not available Not available Grounding with Google Search Not available $35 / 1K grounding requests Used to improve our products Yes No Text Embedding 004 Our state-of-the-art text embedding model. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge Not available Output price Free of charge Not available Tuning price Not available Not available Used to improve our products Yes No [*] Google AI Studio usage is free of charge in all available regions . See Billing FAQs for details. [**] Prices may differ from the prices listed here and the prices offered on Vertex AI. For Vertex prices, see the Vertex AI pricing page . [***] If you are using dynamic retrieval to optimize costs, only requests that contain at least one grounding support URL from the web in their response are charged for Grounding with Google Search. Costs for Gemini always apply. Rate limits are subject to change. Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-07 UTC.
|
text_content/docs_pricing_b1f23843.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/pricing
|
| 2 |
+
Title: Gemini Developer API Pricing | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Gemini Developer API Pricing | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Gemini Developer API Pricing The Gemini API "free tier" is offered through the API service with lower rate limits for testing purposes. Google AI Studio usage is completely free in all available countries. The Gemini API "paid tier" comes with higher rate limits , additional features, and different data handling. Upgrade to the Paid Tier If you're looking to reduce costs and your use case doesn't require immediate real-time responses, check out Batch Mode . Batch Mode is designed to process large volumes of requests asynchronously. Requests submitted using this mode is 50% of the price of interactive (non-batch mode) requests. Gemini 2.5 Pro Try it in Google AI Studio Our state-of-the-art multipurpose model, which excels at coding and complex reasoning tasks. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $1.25, prompts <= 200k tokens $2.50, prompts > 200k tokens Output price (including thinking tokens) Free of charge $10.00, prompts <= 200k tokens $15.00, prompts > 200k Context caching price Not available $0.31, prompts <= 200k tokens $0.625, prompts > 200k $4.50 / 1,000,000 tokens per hour (storage price) Grounding with Google Search Not available 1,500 RPD (free), then $35 / 1,000 requests Used to improve our products Yes No Gemini 2.5 Flash Try it in Google AI Studio Our first hybrid reasoning model which supports a 1M token context window and has thinking budgets. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $0.30 (text / image / video) $1.00 (audio) Output price (including thinking tokens) Free of charge $2.50 Context caching price Not available $0.075 (text / image / video) $0.25 (audio) $1.00 / 1,000,000 tokens per hour (storage price) Grounding with Google Search Free of charge, up to 500 RPD (limit shared with Flash-Lite RPD) 1,500 RPD (free, limit shared with Flash-Lite RPD), then $35 / 1,000 requests Live API Free of charge Input: $0.50 (text), $3.00 (audio / image [video]) Output: $2.00 (text), $12.00 (audio) Used to improve our products Yes No Gemini 2.5 Flash-Lite Preview Try it in Google AI Studio Our smallest and most cost effective model, built for at scale usage. Preview models may change before becoming stable and have more restrictive rate limits. Free Tier Paid Tier, per 1M tokens in USD Input price (text, image, video) Free of charge $0.10 (text / image / video) $0.50 (audio) Output price (including thinking tokens) Free of charge $0.40 Context caching price Not available $0.025 (text / image / video) $0.125 (audio) $1.00 / 1,000,000 tokens per hour (storage price) Grounding with Google Search Free of charge, up to 500 RPD (limit shared with Flash RPD) 1,500 RPD (free, limit shared with Flash RPD), then $35 / 1,000 requests Used to improve our products Yes No Gemini 2.5 Flash Native Audio Try it in Google AI Studio Our native audio models optimized for higher quality audio outputs with better pacing, voice naturalness, verbosity, and mood. Preview models may change before becoming stable and have more restrictive rate limits. Free Tier Paid Tier, per 1M tokens in USD Input price Not available $0.50 (text) $3.00 (audio / video) Output price (including thinking tokens) Not available $2.00 (text) $12.00 (audio) Used to improve our products Yes No Gemini 2.5 Flash Preview TTS Try it in Google AI Studio Our 2.5 Flash text-to-speech audio model optimized for price-performant, low-latency, controllable speech generation. Preview models may change before becoming stable and have more restrictive rate limits. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $0.50 (text) Output price Free of charge $10.00 (audio) Used to improve our products Yes No Gemini 2.5 Pro Preview TTS Try it in Google AI Studio Our 2.5 Pro text-to-speech audio model optimized for powerful, low-latency speech generation for more natural outputs and easier to steer prompts. Preview models may change before becoming stable and have more restrictive rate limits. Free Tier Paid Tier, per 1M tokens in USD Input price Not available $1.00 (text) Output price Not available $20.00 (audio) Used to improve our products Yes No Gemini 2.0 Flash Try it in Google AI Studio Our most balanced multimodal model with great performance across all tasks, with a 1 million token context window, and built for the era of Agents. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $0.10 (text / image / video) $0.70 (audio) Output price Free of charge $0.40 Context caching price Free of charge $0.025 / 1,000,000 tokens (text/image/video) $0.175 / 1,000,000 tokens (audio) Context caching (storage) Free of charge, up to 1,000,000 tokens of storage per hour $1.00 / 1,000,000 tokens per hour Image generation pricing Free of charge $0.039 per image* Tuning price Not available Not available Grounding with Google Search Free of charge, up to 500 RPD 1,500 RPD (free), then $35 / 1,000 requests Live API Free of charge Input: $0.35 (text), $2.10 (audio / image [video]) Output: $1.50 (text), $8.50 (audio) Used to improve our products Yes No [*] Image output is priced at $30 per 1,000,000 tokens. Output images up to 1024x1024px consume 1290 tokens and are equivalent to $0.039 per image. Gemini 2.0 Flash-Lite Try it in Google AI Studio Our smallest and most cost effective model, built for at scale usage. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $0.075 Output price Free of charge $0.30 Context caching price Not available Not available Context caching (storage) Not available Not available Tuning price Not available Not available Grounding with Google Search Not available Not available Used to improve our products Yes No Imagen 4 Preview Try it in Google AI Studio Our latest image generation model, with significantly better text rendering and better overall image quality. Preview models may change before becoming stable and have more restrictive rate limits. Free Tier Paid Tier, per Image in USD Imagen 4 Standard image price Not available $0.04 Imagen 4 Ultra image price Not available $0.06 Used to improve our products Yes No Imagen 3 Try it in Google AI Studio Our state-of-the-art image generation model, available to developers on the paid tier of the Gemini API. Free Tier Paid Tier, per Image in USD Image price Not available $0.03 Used to improve our products Yes No Veo 2 Try the API Our state-of-the-art video generation model, available to developers on the paid tier of the Gemini API. Free Tier Paid Tier, per second in USD Video price Not available $0.35 Used to improve our products Yes No Gemma 3 Try Gemma 3 Our lightweight, state-of the art, open model built from the same technology that powers our Gemini models. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge Not available Output price Free of charge Not available Context caching price Free of charge Not available Context caching (storage) Free of charge Not available Tuning price Not available Not available Grounding with Google Search Not available Not available Used to improve our products Yes No Gemma 3n Try Gemma 3n Our open model built for efficient performance on everyday devices like mobile phones, laptops, and tablets. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge Not available Output price Free of charge Not available Context caching price Free of charge Not available Context caching (storage) Free of charge Not available Tuning price Not available Not available Grounding with Google Search Not available Not available Used to improve our products Yes No Gemini 1.5 Flash Try it in Google AI Studio Our fastest multimodal model with great performance for diverse, repetitive tasks and a 1 million token context window. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $0.075, prompts <= 128k tokens $0.15, prompts > 128k tokens Output price Free of charge $0.30, prompts <= 128k tokens $0.60, prompts > 128k tokens Context caching price Free of charge, up to 1 million tokens of storage per hour $0.01875, prompts <= 128k tokens $0.0375, prompts > 128k tokens Context caching (storage) Free of charge $1.00 per hour Tuning price Token prices are the same for tuned models Tuning service is free of charge. Token prices are the same for tuned models Tuning service is free of charge. Grounding with Google Search Not available $35 / 1K grounding requests Used to improve our products Yes No Gemini 1.5 Flash-8B Try it in Google AI Studio Our smallest model for lower intelligence use cases, with a 1 million token context window. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $0.0375, prompts <= 128k tokens $0.075, prompts > 128k tokens Output price Free of charge $0.15, prompts <= 128k tokens $0.30, prompts > 128k tokens Context caching price Free of charge, up to 1 million tokens of storage per hour $0.01, prompts <= 128k tokens $0.02, prompts > 128k tokens Context caching (storage) Free of charge $0.25 per hour Tuning price Token prices are the same for tuned models Tuning service is free of charge. Token prices are the same for tuned models Tuning service is free of charge. Grounding with Google Search Not available $35 / 1K grounding requests Used to improve our products Yes No Gemini 1.5 Pro Try it in Google AI Studio Our highest intelligence Gemini 1.5 series model, with a breakthrough 2 million token context window. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $1.25, prompts <= 128k tokens $2.50, prompts > 128k tokens Output price Free of charge $5.00, prompts <= 128k tokens $10.00, prompts > 128k tokens Context caching price Not available $0.3125, prompts <= 128k tokens $0.625, prompts > 128k tokens Context caching (storage) Not available $4.50 per hour Tuning price Not available Not available Grounding with Google Search Not available $35 / 1K grounding requests Used to improve our products Yes No Text Embedding 004 Our state-of-the-art text embedding model. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge Not available Output price Free of charge Not available Tuning price Not available Not available Used to improve our products Yes No [*] Google AI Studio usage is free of charge in all available regions . See Billing FAQs for details. [**] Prices may differ from the prices listed here and the prices offered on Vertex AI. For Vertex prices, see the Vertex AI pricing page . [***] If you are using dynamic retrieval to optimize costs, only requests that contain at least one grounding support URL from the web in their response are charged for Grounding with Google Search. Costs for Gemini always apply. Rate limits are subject to change. Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-07 UTC.
|
text_content/docs_pricing_cda76b07.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/pricing#veo-2
|
| 2 |
+
Title: Gemini Developer API Pricing | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Gemini Developer API Pricing | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Gemini Developer API Pricing The Gemini API "free tier" is offered through the API service with lower rate limits for testing purposes. Google AI Studio usage is completely free in all available countries. The Gemini API "paid tier" comes with higher rate limits , additional features, and different data handling. Upgrade to the Paid Tier If you're looking to reduce costs and your use case doesn't require immediate real-time responses, check out Batch Mode . Batch Mode is designed to process large volumes of requests asynchronously. Requests submitted using this mode is 50% of the price of interactive (non-batch mode) requests. Gemini 2.5 Pro Try it in Google AI Studio Our state-of-the-art multipurpose model, which excels at coding and complex reasoning tasks. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $1.25, prompts <= 200k tokens $2.50, prompts > 200k tokens Output price (including thinking tokens) Free of charge $10.00, prompts <= 200k tokens $15.00, prompts > 200k Context caching price Not available $0.31, prompts <= 200k tokens $0.625, prompts > 200k $4.50 / 1,000,000 tokens per hour (storage price) Grounding with Google Search Not available 1,500 RPD (free), then $35 / 1,000 requests Used to improve our products Yes No Gemini 2.5 Flash Try it in Google AI Studio Our first hybrid reasoning model which supports a 1M token context window and has thinking budgets. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $0.30 (text / image / video) $1.00 (audio) Output price (including thinking tokens) Free of charge $2.50 Context caching price Not available $0.075 (text / image / video) $0.25 (audio) $1.00 / 1,000,000 tokens per hour (storage price) Grounding with Google Search Free of charge, up to 500 RPD (limit shared with Flash-Lite RPD) 1,500 RPD (free, limit shared with Flash-Lite RPD), then $35 / 1,000 requests Live API Free of charge Input: $0.50 (text), $3.00 (audio / image [video]) Output: $2.00 (text), $12.00 (audio) Used to improve our products Yes No Gemini 2.5 Flash-Lite Preview Try it in Google AI Studio Our smallest and most cost effective model, built for at scale usage. Preview models may change before becoming stable and have more restrictive rate limits. Free Tier Paid Tier, per 1M tokens in USD Input price (text, image, video) Free of charge $0.10 (text / image / video) $0.50 (audio) Output price (including thinking tokens) Free of charge $0.40 Context caching price Not available $0.025 (text / image / video) $0.125 (audio) $1.00 / 1,000,000 tokens per hour (storage price) Grounding with Google Search Free of charge, up to 500 RPD (limit shared with Flash RPD) 1,500 RPD (free, limit shared with Flash RPD), then $35 / 1,000 requests Used to improve our products Yes No Gemini 2.5 Flash Native Audio Try it in Google AI Studio Our native audio models optimized for higher quality audio outputs with better pacing, voice naturalness, verbosity, and mood. Preview models may change before becoming stable and have more restrictive rate limits. Free Tier Paid Tier, per 1M tokens in USD Input price Not available $0.50 (text) $3.00 (audio / video) Output price (including thinking tokens) Not available $2.00 (text) $12.00 (audio) Used to improve our products Yes No Gemini 2.5 Flash Preview TTS Try it in Google AI Studio Our 2.5 Flash text-to-speech audio model optimized for price-performant, low-latency, controllable speech generation. Preview models may change before becoming stable and have more restrictive rate limits. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $0.50 (text) Output price Free of charge $10.00 (audio) Used to improve our products Yes No Gemini 2.5 Pro Preview TTS Try it in Google AI Studio Our 2.5 Pro text-to-speech audio model optimized for powerful, low-latency speech generation for more natural outputs and easier to steer prompts. Preview models may change before becoming stable and have more restrictive rate limits. Free Tier Paid Tier, per 1M tokens in USD Input price Not available $1.00 (text) Output price Not available $20.00 (audio) Used to improve our products Yes No Gemini 2.0 Flash Try it in Google AI Studio Our most balanced multimodal model with great performance across all tasks, with a 1 million token context window, and built for the era of Agents. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $0.10 (text / image / video) $0.70 (audio) Output price Free of charge $0.40 Context caching price Free of charge $0.025 / 1,000,000 tokens (text/image/video) $0.175 / 1,000,000 tokens (audio) Context caching (storage) Free of charge, up to 1,000,000 tokens of storage per hour $1.00 / 1,000,000 tokens per hour Image generation pricing Free of charge $0.039 per image* Tuning price Not available Not available Grounding with Google Search Free of charge, up to 500 RPD 1,500 RPD (free), then $35 / 1,000 requests Live API Free of charge Input: $0.35 (text), $2.10 (audio / image [video]) Output: $1.50 (text), $8.50 (audio) Used to improve our products Yes No [*] Image output is priced at $30 per 1,000,000 tokens. Output images up to 1024x1024px consume 1290 tokens and are equivalent to $0.039 per image. Gemini 2.0 Flash-Lite Try it in Google AI Studio Our smallest and most cost effective model, built for at scale usage. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $0.075 Output price Free of charge $0.30 Context caching price Not available Not available Context caching (storage) Not available Not available Tuning price Not available Not available Grounding with Google Search Not available Not available Used to improve our products Yes No Imagen 4 Preview Try it in Google AI Studio Our latest image generation model, with significantly better text rendering and better overall image quality. Preview models may change before becoming stable and have more restrictive rate limits. Free Tier Paid Tier, per Image in USD Imagen 4 Standard image price Not available $0.04 Imagen 4 Ultra image price Not available $0.06 Used to improve our products Yes No Imagen 3 Try it in Google AI Studio Our state-of-the-art image generation model, available to developers on the paid tier of the Gemini API. Free Tier Paid Tier, per Image in USD Image price Not available $0.03 Used to improve our products Yes No Veo 2 Try the API Our state-of-the-art video generation model, available to developers on the paid tier of the Gemini API. Free Tier Paid Tier, per second in USD Video price Not available $0.35 Used to improve our products Yes No Gemma 3 Try Gemma 3 Our lightweight, state-of the art, open model built from the same technology that powers our Gemini models. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge Not available Output price Free of charge Not available Context caching price Free of charge Not available Context caching (storage) Free of charge Not available Tuning price Not available Not available Grounding with Google Search Not available Not available Used to improve our products Yes No Gemma 3n Try Gemma 3n Our open model built for efficient performance on everyday devices like mobile phones, laptops, and tablets. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge Not available Output price Free of charge Not available Context caching price Free of charge Not available Context caching (storage) Free of charge Not available Tuning price Not available Not available Grounding with Google Search Not available Not available Used to improve our products Yes No Gemini 1.5 Flash Try it in Google AI Studio Our fastest multimodal model with great performance for diverse, repetitive tasks and a 1 million token context window. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $0.075, prompts <= 128k tokens $0.15, prompts > 128k tokens Output price Free of charge $0.30, prompts <= 128k tokens $0.60, prompts > 128k tokens Context caching price Free of charge, up to 1 million tokens of storage per hour $0.01875, prompts <= 128k tokens $0.0375, prompts > 128k tokens Context caching (storage) Free of charge $1.00 per hour Tuning price Token prices are the same for tuned models Tuning service is free of charge. Token prices are the same for tuned models Tuning service is free of charge. Grounding with Google Search Not available $35 / 1K grounding requests Used to improve our products Yes No Gemini 1.5 Flash-8B Try it in Google AI Studio Our smallest model for lower intelligence use cases, with a 1 million token context window. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $0.0375, prompts <= 128k tokens $0.075, prompts > 128k tokens Output price Free of charge $0.15, prompts <= 128k tokens $0.30, prompts > 128k tokens Context caching price Free of charge, up to 1 million tokens of storage per hour $0.01, prompts <= 128k tokens $0.02, prompts > 128k tokens Context caching (storage) Free of charge $0.25 per hour Tuning price Token prices are the same for tuned models Tuning service is free of charge. Token prices are the same for tuned models Tuning service is free of charge. Grounding with Google Search Not available $35 / 1K grounding requests Used to improve our products Yes No Gemini 1.5 Pro Try it in Google AI Studio Our highest intelligence Gemini 1.5 series model, with a breakthrough 2 million token context window. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge $1.25, prompts <= 128k tokens $2.50, prompts > 128k tokens Output price Free of charge $5.00, prompts <= 128k tokens $10.00, prompts > 128k tokens Context caching price Not available $0.3125, prompts <= 128k tokens $0.625, prompts > 128k tokens Context caching (storage) Not available $4.50 per hour Tuning price Not available Not available Grounding with Google Search Not available $35 / 1K grounding requests Used to improve our products Yes No Text Embedding 004 Our state-of-the-art text embedding model. Free Tier Paid Tier, per 1M tokens in USD Input price Free of charge Not available Output price Free of charge Not available Tuning price Not available Not available Used to improve our products Yes No [*] Google AI Studio usage is free of charge in all available regions . See Billing FAQs for details. [**] Prices may differ from the prices listed here and the prices offered on Vertex AI. For Vertex prices, see the Vertex AI pricing page . [***] If you are using dynamic retrieval to optimize costs, only requests that contain at least one grounding support URL from the web in their response are charged for Grounding with Google Search. Costs for Gemini always apply. Rate limits are subject to change. Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-07 UTC.
|
text_content/docs_prompting-strategies_313b4935.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/prompting-strategies#few-shot
|
| 2 |
+
Title: Prompt design strategies | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Prompt design strategies | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Prompt design strategies Prompt design is the process of creating prompts, or natural language requests, that elicit accurate, high quality responses from a language model. This page introduces basic concepts, strategies, and best practices to get you started designing prompts to get the most out of Gemini AI models. Topic-specific prompt guides Looking for more specific prompt strategies? Check out our other prompting guides on: Prompting with media files Prompting for image generation Prompting for video generation Google AI Studio also hosts a prompt gallery meant to interactively showcase many of the concepts shared in this guide. Clear and specific instructions An effective and efficient way to customize model behavior is to provide it with clear and specific instructions. Instructions can be in the form of a question, step-by-step tasks, or as complex as mapping out a user's experience and mindset. Input Input is the required text in the prompt that you want the model to provide a response to. Inputs can be a question that the model answers (question input), a task the model performs (task input), an entity the model operates on (entity input), or partial input that the model completes or continues (completion input). Input type Prompt Generated output Question What's a good name for a flower shop that specializes in selling bouquets of dried flowers? Create a list of 5 options with just the names. Here are 10 names for a flower shop specializing in dried flowers: 1. Everlasting Blooms 2. Dried & Delightful 3. The Petal Preserve 4. Whispers of Wildflowers 5. Forever Blooms Task Give me a simple list of just the things that I must bring on a camping trip. The list should have 5 items. 1. **Tent:** Your shelter from the elements. 2. **Sleeping Bag:** Stay warm and comfortable. 3. **Sleeping Pad:** Insulation and cushioning for a good night's sleep. 4. **Headlamp or Flashlight:** Light for nighttime activities. 5. **First Aid Kit:** Be prepared for minor injuries. Entity Classify the following items as [large, small]: Elephant Mouse Snail Here's the classification: * **Elephant:** Large * **Mouse:** Small * **Snail:** Small Partial input completion Generative language models work like an advanced auto completion tool. When you provide partial content, the model can provide the rest of the content or what it thinks is a continuation of that content as a response. When doing so, if you include any examples or context, the model can take those examples or context into account. The following example provides a prompt with an instruction and an entity input: Prompt: For the given order, return a JSON object that has the fields cheeseburger, hamburger, fries, or drink, with the value being the quantity. Order: A burger and a drink. Response: { "cheeseburger": 0, "hamburger": 1, "fries": 0, "drink": 1 } (gemini-2.5-flash) While the model did as prompted, writing out the instructions in natural language can sometimes be challenging and it leaves a lot to the model's interpretation. For example, a restaurants menu might contain many items. To reduce the size of the JSON response, you probably want to omit the items that weren't ordered. In this case, you can give an example and a response prefix and let the model complete it: Prompt: Valid fields are cheeseburger, hamburger, fries, and drink. Order: Give me a cheeseburger and fries Output: ``` { "cheeseburger": 1, "fries": 1 } ``` Order: I want two burgers, a drink, and fries. Output: Response: ``` { "hamburger": 2, "drink": 1, "fries": 1 } ``` (gemini-2.5-flash) Notice how "cheeseburger" was excluded from the output because it wasn't a part of the order. Constraints Specify any constraints on reading the prompt or generating a response. You can tell the model what to do and not to do. For example, you can specify a constraint in the prompt on how long you want a summary to be: Prompt: Summarize this text in one sentence: Text: A quantum computer exploits quantum mechanical phenomena to perform calculations exponentially faster than any modern traditional computer. At very tiny scales, physical matter acts as both particles and as waves, and quantum computing uses specialized hardware to leverage this behavior. The operating principles of quantum devices is beyond the scope of classical physics. When deployed at scale, quantum computers could be used in a wide variety of applications such as: in cybersecurity to break existing encryption methods while helping researchers create new ones, in meteorology to develop better weather forecasting etc. However, the current state of the art quantum computers are still largely experimental and impractical. Response: Exploiting quantum mechanical phenomena, quantum computers can perform calculations exponentially faster than traditional computers for potential applications like cybersecurity and meteorology, although they are currently largely experimental and impractical. (gemini-2.5-flash) Response format You can give instructions that specify the format of the response. For example, you can ask for the response to be formatted as a table, bulleted list, elevator pitch, keywords, sentence, or paragraph. The following system instruction tells the model to be more conversational in its response: System instruction All questions should be answered comprehensively with details, unless the user requests a concise response specifically. Prompt What is a smart way to make a business that sells DVD's in 2025? Response: Starting a business selling DVDs in 2025 presents significant challenges due to the overwhelming dominance of digital streaming and on-demand services; a "smart" approach would therefore require a highly specialized niche, a strong value proposition beyond just the disc, and likely a focus on community or collector markets rather than mainstream consumption. Here's a breakdown of what a smart DVD business in 2025 would likely entail: 1. Extreme Niche Specialization: You cannot compete with streaming libraries. Your business must focus on what digital doesn't offer easily or at all. This could include: * Out-of-Print & Rare Titles: Sourcing films or TV shows that have never been released digitally or are currently unavailable on major streaming platforms. .... (gemini-2.5-flash) Format responses with the completion strategy The completion strategy can also help format the response. The following example prompts the model to create an essay outline: Prompt: Create an outline for an essay about hummingbirds. Response: Here is a possible outline for an essay about hummingbirds: I. Introduction A. Hook: Start with a captivating fact or description about hummingbirds (e.g., their speed, iridescent colors, or tiny size). B. Background: Briefly introduce hummingbirds – where they are found (Americas), their reputation as unique birds. C. Thesis Statement: State the main point of the essay, which could be about the remarkable adaptations, behaviors, and ecological importance of hummingbirds. II. Appearance and Identification ..... (gemini-2.5-flash) The prompt didn't specify the format for the outline and the model chose a format for you. To get the model to return an outline in a specific format, you can add text that represents the start of the outline and let the model complete it based on the pattern that you initiated. Prompt: Create an outline for an essay about hummingbirds. I. Introduction * Response: Okay, here is an outline for an essay about hummingbirds, starting from the introduction you provided: I. Introduction * Hook: Start with a captivating fact or description about hummingbirds (e.g., their speed, iridescent colors, or tiny size). * Background: Briefly introduce hummingbirds – where they are found (Americas), their reputation as unique birds, and their general characteristics. * Thesis Statement: State the main point of the essay, focusing on the remarkable adaptations, behaviors, and ecological significance that make hummingbirds extraordinary. ..... (gemini-2.5-flash) Zero-shot vs few-shot prompts You can include examples in the prompt that show the model what getting it right looks like. The model attempts to identify patterns and relationships from the examples and applies them when generating a response. Prompts that contain a few examples are called few-shot prompts, while prompts that provide no examples are called zero-shot prompts. Few-shot prompts are often used to regulate the formatting, phrasing, scoping, or general patterning of model responses. Use specific and varied examples to help the model narrow its focus and generate more accurate results. We recommend to always include few-shot examples in your prompts. Prompts without few-shot examples are likely to be less effective. In fact, you can remove instructions from your prompt if your examples are clear enough in showing the task at hand. The following zero-shot prompt asks the model to choose the best explanation. Prompt: Please choose the best explanation to the question: Question: How is snow formed? Explanation1: Snow is formed when water vapor in the air freezes into ice crystals in the atmosphere, which can combine and grow into snowflakes as they fall through the atmosphere and accumulate on the ground. Explanation2: Water vapor freezes into ice crystals forming snow. Answer: Response: Explanation1 is the better explanation because it provides more detail on the process, including how ice crystals combine and grow into snowflakes as they fall through the atmosphere. (gemini-2.5-flash) If your use case requires the model to produce concise responses, you can include examples in the prompt that give preference to concise responses. The following prompt provides two examples that show preference to the shorter explanations. In the response, you can see that the examples guided the model to choose the shorter explanation ( Explanation2 ) as opposed to the longer explanation ( Explanation1 ) like it did previously. Prompt: Below are some examples showing a question, explanation, and answer format: Question: Why is the sky blue? Explanation1: The sky appears blue because of Rayleigh scattering, which causes shorter blue wavelengths of light to be scattered more easily than longer red wavelengths, making the sky look blue. Explanation2: Due to Rayleigh scattering effect. Answer: Explanation2 Question: What is the cause of earthquakes? Explanation1: Sudden release of energy in the Earth's crust. Explanation2: Earthquakes happen when tectonic plates suddenly slip or break apart, causing a release of energy that creates seismic waves that can shake the ground and cause damage. Answer: Explanation1 Now, Answer the following question given the example formats above: Question: How is snow formed? Explanation1: Snow is formed when water vapor in the air freezes into ice crystals in the atmosphere, which can combine and grow into snowflakes as they fall through the atmosphere and accumulate on the ground. Explanation2: Water vapor freezes into ice crystals forming snow. Answer: Response: Answer: Explanation2 (gemini-2.5-flash) Optimal number of examples Models like Gemini can often pick up on patterns using a few examples, though you may need to experiment with the number of examples to provide in the prompt for the best results. At the same time, if you include too many examples, the model may start to overfit the response to the examples. Patterns vs anti patterns Using examples to show the model a pattern to follow is more effective than using examples to show the model an anti pattern to avoid. ⛔ Negative pattern: Example Don't end haikus with a question: Haiku are fun A short and simple poem Don't you enjoy them? ✅ Positive pattern: Example Always end haikus with an assertion: Haiku are fun A short and simple poem A joy to write Consistent formatting Make sure that the structure and formatting of few-shot examples are the same to avoid responses with undesired formats. One of the primary objectives of adding few-shot examples in prompts is to show the model the response format. Therefore, it is essential to ensure a consistent format across all examples, especially paying attention to XML tags, white spaces, newlines, and example splitters. Add context You can include instructions and information in a prompt that the model needs to solve a problem, instead of assuming that the model has all of the required information. This contextual information helps the model understand the constraints and details of what you're asking for it to do. The following example asks the model to give troubleshooting guidance for a router: Prompt: What should I do to fix my disconnected wifi? The light on my Google Wifi router is yellow and blinking slowly. Response: A yellow blinking light on your Google Wifi router usually indicates a problem with the internet connection. Here are some steps to troubleshoot and fix the issue: 1. Restart your router and modem: - Unplug your modem from the power outlet and wait for 30 seconds. - Plug your modem back in and wait for it to fully boot up. - Unplug your Google Wifi router and wait for 30 seconds. - Plug your Google Wifi router back in and wait for it to fully boot up. ..... (gemini-2.5-flash) The response looks like generic troubleshooting information that's not specific to the router or the status of the LED indicator lights. To customize the response for the specific router, you can add to the prompt the router's troubleshooting guide as context for it to refer to when providing a response. Prompt: Answer the question using the text below. Respond with only the text provided. Question: What should I do to fix my disconnected wifi? The light on my Google Wifi router is yellow and blinking slowly. Text: Color: Slowly pulsing yellow What it means: There is a network error. What to do: Check that the Ethernet cable is connected to both your router and your modem and both devices are turned on. You might need to unplug and plug in each device again. Color: Fast blinking yellow What it means: You are holding down the reset button and are factory resetting this device. What to do: If you keep holding down the reset button, after about 12 seconds, the light will turn solid yellow. Once it is solid yellow, let go of the factory reset button. Color: Solid yellow What it means: Router is factory resetting. What to do: This can take up to 10 minutes. When it's done, the device will reset itself and start pulsing white, letting you know it's ready for setup. Color: Solid red What it means: Something is wrong. What to do: Critical failure. Factory reset the router. If the light stays red, contact Wifi customer support. Response: Check that the Ethernet cable is connected to both your router and your modem and both devices are turned on. You might need to unplug and plug in each device again. (gemini-2.5-flash) Add prefixes A prefix is a word or phrase that you add to the prompt content that can serve several purposes, depending on where you put the prefix: Input prefix: Adding a prefix to the input signals semantically meaningful parts of the input to the model. For example, the prefixes "English:" and "French:" demarcate two different languages. Output prefix: Even though the output is generated by the model, you can add a prefix for the output in the prompt. The output prefix gives the model information about what's expected as a response. For example, the output prefix "JSON:" signals to the model that the output should be in JSON format. Example prefix: In few-shot prompts, adding prefixes to the examples provides labels that the model can use when generating the output, which makes it easier to parse output content. In the following example, "Text:" is the input prefix and "The answer is:" is the output prefix. Prompt: Classify the text as one of the following categories. - large - small Text: Rhino The answer is: large Text: Mouse The answer is: small Text: Snail The answer is: small Text: Elephant The answer is: Response: The answer is: large (gemini-2.5-flash) Break down prompts into components For use cases that require complex prompts, you can help the model manage this complexity by breaking things down into simpler components. Break down instructions: Instead of having many instructions in one prompt, create one prompt per instruction. You can choose which prompt to process based on the user's input. Chain prompts: For complex tasks that involve multiple sequential steps, make each step a prompt and chain the prompts together in a sequence. In this sequential chain of prompts, the output of one prompt in the sequence becomes the input of the next prompt. The output of the last prompt in the sequence is the final output. Aggregate responses: Aggregation is when you want to perform different parallel tasks on different portions of the data and aggregate the results to produce the final output. For example, you can tell the model to perform one operation on the first part of the data, perform another operation on the rest of the data and aggregate the results. Experiment with model parameters Each call that you send to a model includes parameter values that control how the model generates a response. The model can generate different results for different parameter values. Experiment with different parameter values to get the best values for the task. The parameters available for different models may differ. The most common parameters are the following: Max output tokens: Specifies the maximum number of tokens that can be generated in the response. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words. Temperature: The temperature controls the degree of randomness in token selection. The temperature is used for sampling during response generation, which occurs when topP and topK are applied. Lower temperatures are good for prompts that require a more deterministic or less open-ended response, while higher temperatures can lead to more diverse or creative results. A temperature of 0 is deterministic, meaning that the highest probability response is always selected. topK : The topK parameter changes how the model selects tokens for output. A topK of 1 means the selected token is the most probable among all the tokens in the model's vocabulary (also called greedy decoding), while a topK of 3 means that the next token is selected from among the 3 most probable using the temperature. For each token selection step, the topK tokens with the highest probabilities are sampled. Tokens are then further filtered based on topP with the final token selected using temperature sampling. topP : The topP parameter changes how the model selects tokens for output. Tokens are selected from the most to least probable until the sum of their probabilities equals the topP value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the topP value is 0.5, then the model will select either A or B as the next token by using the temperature and exclude C as a candidate. The default topP value is 0.95. stop_sequences : Set a stop sequence to tell the model to stop generating content. A stop sequence can be any sequence of characters. Try to avoid using a sequence of characters that may appear in the generated content. Prompt iteration strategies Prompt design can sometimes require a few iterations before you consistently get the response you're looking for. This section provides guidance on some things you can try when iterating on your prompts: Use different phrasing: Using different words or phrasing in your prompts often yields different responses from the model even though they all mean the same thing. If you're not getting the expected results from your prompt, try rephrasing it. Version 1: How do I bake a pie? Version 2: Suggest a recipe for a pie. Version 3: What's a good pie recipe? Switch to an analogous task: If you can't get the model to follow your instructions for a task, try giving it instructions for an analogous task that achieves the same result. This prompt tells the model to categorize a book by using predefined categories: Prompt: Which category does The Odyssey belong to: thriller sci-fi mythology biography Response: The Odyssey belongs to the category of **mythology**. Here's why: * **Mythology:** The Odyssey tells the story of Odysseus, a hero from Greek mythology, and his journey home after the Trojan War. It features gods, monsters, and supernatural events common to Greek mythology. ..... (gemini-2.5-flash) The response is correct, but the model didn't stay within the bounds of the options. You also want to model to just respond with one of the options instead of in a full sentence. In this case, you can rephrase the instructions as a multiple choice question and ask the model to choose an option. Prompt: Multiple choice problem: Which of the following options describes the book The Odyssey? Options: thriller sci-fi mythology biography Response: The correct answer is mythology . (gemini-2.5-flash) Change the order of prompt content: The order of the content in the prompt can sometimes affect the response. Try changing the content order and see how that affects the response. Version 1 : [ examples ] [ context ] [ input ] Version 2 : [ input ] [ examples ] [ context ] Version 3 : [ examples ] [ input ] [ context ] Fallback responses A fallback response is a response returned by the model when either the prompt or the response triggers a safety filter. An example of a fallback response is "I'm not able to help with that, as I'm only a language model." If the model responds with a fallback response, try increasing the temperature. Things to avoid Avoid relying on models to generate factual information. Use with care on math and logic problems. Generative models under the hood This section aims to answer the question - Is there randomness in generative models' responses, or are they deterministic? The short answer - yes to both. When you prompt a generative model, a text response is generated in two stages. In the first stage, the generative model processes the input prompt and generates a probability distribution over possible tokens (words) that are likely to come next. For example, if you prompt with the input text "The dog jumped over the ... ", the generative model will produce an array of probable next words: [("fence", 0.77), ("ledge", 0.12), ("blanket", 0.03), ...] This process is deterministic; a generative model will produce this same distribution every time it's input the same prompt text. In the second stage, the generative model converts these distributions into actual text responses through one of several decoding strategies. A simple decoding strategy might select the most likely token at every timestep. This process would always be deterministic. However, you could instead choose to generate a response by randomly sampling over the distribution returned by the model. This process would be stochastic (random). Control the degree of randomness allowed in this decoding process by setting the temperature. A temperature of 0 means only the most likely tokens are selected, and there's no randomness. Conversely, a high temperature injects a high degree of randomness into the tokens selected by the model, leading to more unexpected, surprising model responses. Next steps Now that you have a deeper understanding of prompt design, try writing your own prompts using Google AI Studio . To learn about multimodal prompting, see Prompting with media files . To learn about image prompting, see the Imagen prompt guide To learn about video prompting, see the Veo prompt guide Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-04-28 UTC.
|
text_content/docs_prompting-strategies_44c9eff4.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
URL: https://ai.google.dev/gemini-api/docs/prompting-strategies#main-content
|
| 2 |
+
Title: Prompt design strategies | Gemini API | Google AI for Developers
|
| 3 |
+
==================================================
|
| 4 |
+
|
| 5 |
+
Prompt design strategies | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Prompt design strategies Prompt design is the process of creating prompts, or natural language requests, that elicit accurate, high quality responses from a language model. This page introduces basic concepts, strategies, and best practices to get you started designing prompts to get the most out of Gemini AI models. Topic-specific prompt guides Looking for more specific prompt strategies? Check out our other prompting guides on: Prompting with media files Prompting for image generation Prompting for video generation Google AI Studio also hosts a prompt gallery meant to interactively showcase many of the concepts shared in this guide. Clear and specific instructions An effective and efficient way to customize model behavior is to provide it with clear and specific instructions. Instructions can be in the form of a question, step-by-step tasks, or as complex as mapping out a user's experience and mindset. Input Input is the required text in the prompt that you want the model to provide a response to. Inputs can be a question that the model answers (question input), a task the model performs (task input), an entity the model operates on (entity input), or partial input that the model completes or continues (completion input). Input type Prompt Generated output Question What's a good name for a flower shop that specializes in selling bouquets of dried flowers? Create a list of 5 options with just the names. Here are 10 names for a flower shop specializing in dried flowers: 1. Everlasting Blooms 2. Dried & Delightful 3. The Petal Preserve 4. Whispers of Wildflowers 5. Forever Blooms Task Give me a simple list of just the things that I must bring on a camping trip. The list should have 5 items. 1. **Tent:** Your shelter from the elements. 2. **Sleeping Bag:** Stay warm and comfortable. 3. **Sleeping Pad:** Insulation and cushioning for a good night's sleep. 4. **Headlamp or Flashlight:** Light for nighttime activities. 5. **First Aid Kit:** Be prepared for minor injuries. Entity Classify the following items as [large, small]: Elephant Mouse Snail Here's the classification: * **Elephant:** Large * **Mouse:** Small * **Snail:** Small Partial input completion Generative language models work like an advanced auto completion tool. When you provide partial content, the model can provide the rest of the content or what it thinks is a continuation of that content as a response. When doing so, if you include any examples or context, the model can take those examples or context into account. The following example provides a prompt with an instruction and an entity input: Prompt: For the given order, return a JSON object that has the fields cheeseburger, hamburger, fries, or drink, with the value being the quantity. Order: A burger and a drink. Response: { "cheeseburger": 0, "hamburger": 1, "fries": 0, "drink": 1 } (gemini-2.5-flash) While the model did as prompted, writing out the instructions in natural language can sometimes be challenging and it leaves a lot to the model's interpretation. For example, a restaurants menu might contain many items. To reduce the size of the JSON response, you probably want to omit the items that weren't ordered. In this case, you can give an example and a response prefix and let the model complete it: Prompt: Valid fields are cheeseburger, hamburger, fries, and drink. Order: Give me a cheeseburger and fries Output: ``` { "cheeseburger": 1, "fries": 1 } ``` Order: I want two burgers, a drink, and fries. Output: Response: ``` { "hamburger": 2, "drink": 1, "fries": 1 } ``` (gemini-2.5-flash) Notice how "cheeseburger" was excluded from the output because it wasn't a part of the order. Constraints Specify any constraints on reading the prompt or generating a response. You can tell the model what to do and not to do. For example, you can specify a constraint in the prompt on how long you want a summary to be: Prompt: Summarize this text in one sentence: Text: A quantum computer exploits quantum mechanical phenomena to perform calculations exponentially faster than any modern traditional computer. At very tiny scales, physical matter acts as both particles and as waves, and quantum computing uses specialized hardware to leverage this behavior. The operating principles of quantum devices is beyond the scope of classical physics. When deployed at scale, quantum computers could be used in a wide variety of applications such as: in cybersecurity to break existing encryption methods while helping researchers create new ones, in meteorology to develop better weather forecasting etc. However, the current state of the art quantum computers are still largely experimental and impractical. Response: Exploiting quantum mechanical phenomena, quantum computers can perform calculations exponentially faster than traditional computers for potential applications like cybersecurity and meteorology, although they are currently largely experimental and impractical. (gemini-2.5-flash) Response format You can give instructions that specify the format of the response. For example, you can ask for the response to be formatted as a table, bulleted list, elevator pitch, keywords, sentence, or paragraph. The following system instruction tells the model to be more conversational in its response: System instruction All questions should be answered comprehensively with details, unless the user requests a concise response specifically. Prompt What is a smart way to make a business that sells DVD's in 2025? Response: Starting a business selling DVDs in 2025 presents significant challenges due to the overwhelming dominance of digital streaming and on-demand services; a "smart" approach would therefore require a highly specialized niche, a strong value proposition beyond just the disc, and likely a focus on community or collector markets rather than mainstream consumption. Here's a breakdown of what a smart DVD business in 2025 would likely entail: 1. Extreme Niche Specialization: You cannot compete with streaming libraries. Your business must focus on what digital doesn't offer easily or at all. This could include: * Out-of-Print & Rare Titles: Sourcing films or TV shows that have never been released digitally or are currently unavailable on major streaming platforms. .... (gemini-2.5-flash) Format responses with the completion strategy The completion strategy can also help format the response. The following example prompts the model to create an essay outline: Prompt: Create an outline for an essay about hummingbirds. Response: Here is a possible outline for an essay about hummingbirds: I. Introduction A. Hook: Start with a captivating fact or description about hummingbirds (e.g., their speed, iridescent colors, or tiny size). B. Background: Briefly introduce hummingbirds – where they are found (Americas), their reputation as unique birds. C. Thesis Statement: State the main point of the essay, which could be about the remarkable adaptations, behaviors, and ecological importance of hummingbirds. II. Appearance and Identification ..... (gemini-2.5-flash) The prompt didn't specify the format for the outline and the model chose a format for you. To get the model to return an outline in a specific format, you can add text that represents the start of the outline and let the model complete it based on the pattern that you initiated. Prompt: Create an outline for an essay about hummingbirds. I. Introduction * Response: Okay, here is an outline for an essay about hummingbirds, starting from the introduction you provided: I. Introduction * Hook: Start with a captivating fact or description about hummingbirds (e.g., their speed, iridescent colors, or tiny size). * Background: Briefly introduce hummingbirds – where they are found (Americas), their reputation as unique birds, and their general characteristics. * Thesis Statement: State the main point of the essay, focusing on the remarkable adaptations, behaviors, and ecological significance that make hummingbirds extraordinary. ..... (gemini-2.5-flash) Zero-shot vs few-shot prompts You can include examples in the prompt that show the model what getting it right looks like. The model attempts to identify patterns and relationships from the examples and applies them when generating a response. Prompts that contain a few examples are called few-shot prompts, while prompts that provide no examples are called zero-shot prompts. Few-shot prompts are often used to regulate the formatting, phrasing, scoping, or general patterning of model responses. Use specific and varied examples to help the model narrow its focus and generate more accurate results. We recommend to always include few-shot examples in your prompts. Prompts without few-shot examples are likely to be less effective. In fact, you can remove instructions from your prompt if your examples are clear enough in showing the task at hand. The following zero-shot prompt asks the model to choose the best explanation. Prompt: Please choose the best explanation to the question: Question: How is snow formed? Explanation1: Snow is formed when water vapor in the air freezes into ice crystals in the atmosphere, which can combine and grow into snowflakes as they fall through the atmosphere and accumulate on the ground. Explanation2: Water vapor freezes into ice crystals forming snow. Answer: Response: Explanation1 is the better explanation because it provides more detail on the process, including how ice crystals combine and grow into snowflakes as they fall through the atmosphere. (gemini-2.5-flash) If your use case requires the model to produce concise responses, you can include examples in the prompt that give preference to concise responses. The following prompt provides two examples that show preference to the shorter explanations. In the response, you can see that the examples guided the model to choose the shorter explanation ( Explanation2 ) as opposed to the longer explanation ( Explanation1 ) like it did previously. Prompt: Below are some examples showing a question, explanation, and answer format: Question: Why is the sky blue? Explanation1: The sky appears blue because of Rayleigh scattering, which causes shorter blue wavelengths of light to be scattered more easily than longer red wavelengths, making the sky look blue. Explanation2: Due to Rayleigh scattering effect. Answer: Explanation2 Question: What is the cause of earthquakes? Explanation1: Sudden release of energy in the Earth's crust. Explanation2: Earthquakes happen when tectonic plates suddenly slip or break apart, causing a release of energy that creates seismic waves that can shake the ground and cause damage. Answer: Explanation1 Now, Answer the following question given the example formats above: Question: How is snow formed? Explanation1: Snow is formed when water vapor in the air freezes into ice crystals in the atmosphere, which can combine and grow into snowflakes as they fall through the atmosphere and accumulate on the ground. Explanation2: Water vapor freezes into ice crystals forming snow. Answer: Response: Answer: Explanation2 (gemini-2.5-flash) Optimal number of examples Models like Gemini can often pick up on patterns using a few examples, though you may need to experiment with the number of examples to provide in the prompt for the best results. At the same time, if you include too many examples, the model may start to overfit the response to the examples. Patterns vs anti patterns Using examples to show the model a pattern to follow is more effective than using examples to show the model an anti pattern to avoid. ⛔ Negative pattern: Example Don't end haikus with a question: Haiku are fun A short and simple poem Don't you enjoy them? ✅ Positive pattern: Example Always end haikus with an assertion: Haiku are fun A short and simple poem A joy to write Consistent formatting Make sure that the structure and formatting of few-shot examples are the same to avoid responses with undesired formats. One of the primary objectives of adding few-shot examples in prompts is to show the model the response format. Therefore, it is essential to ensure a consistent format across all examples, especially paying attention to XML tags, white spaces, newlines, and example splitters. Add context You can include instructions and information in a prompt that the model needs to solve a problem, instead of assuming that the model has all of the required information. This contextual information helps the model understand the constraints and details of what you're asking for it to do. The following example asks the model to give troubleshooting guidance for a router: Prompt: What should I do to fix my disconnected wifi? The light on my Google Wifi router is yellow and blinking slowly. Response: A yellow blinking light on your Google Wifi router usually indicates a problem with the internet connection. Here are some steps to troubleshoot and fix the issue: 1. Restart your router and modem: - Unplug your modem from the power outlet and wait for 30 seconds. - Plug your modem back in and wait for it to fully boot up. - Unplug your Google Wifi router and wait for 30 seconds. - Plug your Google Wifi router back in and wait for it to fully boot up. ..... (gemini-2.5-flash) The response looks like generic troubleshooting information that's not specific to the router or the status of the LED indicator lights. To customize the response for the specific router, you can add to the prompt the router's troubleshooting guide as context for it to refer to when providing a response. Prompt: Answer the question using the text below. Respond with only the text provided. Question: What should I do to fix my disconnected wifi? The light on my Google Wifi router is yellow and blinking slowly. Text: Color: Slowly pulsing yellow What it means: There is a network error. What to do: Check that the Ethernet cable is connected to both your router and your modem and both devices are turned on. You might need to unplug and plug in each device again. Color: Fast blinking yellow What it means: You are holding down the reset button and are factory resetting this device. What to do: If you keep holding down the reset button, after about 12 seconds, the light will turn solid yellow. Once it is solid yellow, let go of the factory reset button. Color: Solid yellow What it means: Router is factory resetting. What to do: This can take up to 10 minutes. When it's done, the device will reset itself and start pulsing white, letting you know it's ready for setup. Color: Solid red What it means: Something is wrong. What to do: Critical failure. Factory reset the router. If the light stays red, contact Wifi customer support. Response: Check that the Ethernet cable is connected to both your router and your modem and both devices are turned on. You might need to unplug and plug in each device again. (gemini-2.5-flash) Add prefixes A prefix is a word or phrase that you add to the prompt content that can serve several purposes, depending on where you put the prefix: Input prefix: Adding a prefix to the input signals semantically meaningful parts of the input to the model. For example, the prefixes "English:" and "French:" demarcate two different languages. Output prefix: Even though the output is generated by the model, you can add a prefix for the output in the prompt. The output prefix gives the model information about what's expected as a response. For example, the output prefix "JSON:" signals to the model that the output should be in JSON format. Example prefix: In few-shot prompts, adding prefixes to the examples provides labels that the model can use when generating the output, which makes it easier to parse output content. In the following example, "Text:" is the input prefix and "The answer is:" is the output prefix. Prompt: Classify the text as one of the following categories. - large - small Text: Rhino The answer is: large Text: Mouse The answer is: small Text: Snail The answer is: small Text: Elephant The answer is: Response: The answer is: large (gemini-2.5-flash) Break down prompts into components For use cases that require complex prompts, you can help the model manage this complexity by breaking things down into simpler components. Break down instructions: Instead of having many instructions in one prompt, create one prompt per instruction. You can choose which prompt to process based on the user's input. Chain prompts: For complex tasks that involve multiple sequential steps, make each step a prompt and chain the prompts together in a sequence. In this sequential chain of prompts, the output of one prompt in the sequence becomes the input of the next prompt. The output of the last prompt in the sequence is the final output. Aggregate responses: Aggregation is when you want to perform different parallel tasks on different portions of the data and aggregate the results to produce the final output. For example, you can tell the model to perform one operation on the first part of the data, perform another operation on the rest of the data and aggregate the results. Experiment with model parameters Each call that you send to a model includes parameter values that control how the model generates a response. The model can generate different results for different parameter values. Experiment with different parameter values to get the best values for the task. The parameters available for different models may differ. The most common parameters are the following: Max output tokens: Specifies the maximum number of tokens that can be generated in the response. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words. Temperature: The temperature controls the degree of randomness in token selection. The temperature is used for sampling during response generation, which occurs when topP and topK are applied. Lower temperatures are good for prompts that require a more deterministic or less open-ended response, while higher temperatures can lead to more diverse or creative results. A temperature of 0 is deterministic, meaning that the highest probability response is always selected. topK : The topK parameter changes how the model selects tokens for output. A topK of 1 means the selected token is the most probable among all the tokens in the model's vocabulary (also called greedy decoding), while a topK of 3 means that the next token is selected from among the 3 most probable using the temperature. For each token selection step, the topK tokens with the highest probabilities are sampled. Tokens are then further filtered based on topP with the final token selected using temperature sampling. topP : The topP parameter changes how the model selects tokens for output. Tokens are selected from the most to least probable until the sum of their probabilities equals the topP value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1 and the topP value is 0.5, then the model will select either A or B as the next token by using the temperature and exclude C as a candidate. The default topP value is 0.95. stop_sequences : Set a stop sequence to tell the model to stop generating content. A stop sequence can be any sequence of characters. Try to avoid using a sequence of characters that may appear in the generated content. Prompt iteration strategies Prompt design can sometimes require a few iterations before you consistently get the response you're looking for. This section provides guidance on some things you can try when iterating on your prompts: Use different phrasing: Using different words or phrasing in your prompts often yields different responses from the model even though they all mean the same thing. If you're not getting the expected results from your prompt, try rephrasing it. Version 1: How do I bake a pie? Version 2: Suggest a recipe for a pie. Version 3: What's a good pie recipe? Switch to an analogous task: If you can't get the model to follow your instructions for a task, try giving it instructions for an analogous task that achieves the same result. This prompt tells the model to categorize a book by using predefined categories: Prompt: Which category does The Odyssey belong to: thriller sci-fi mythology biography Response: The Odyssey belongs to the category of **mythology**. Here's why: * **Mythology:** The Odyssey tells the story of Odysseus, a hero from Greek mythology, and his journey home after the Trojan War. It features gods, monsters, and supernatural events common to Greek mythology. ..... (gemini-2.5-flash) The response is correct, but the model didn't stay within the bounds of the options. You also want to model to just respond with one of the options instead of in a full sentence. In this case, you can rephrase the instructions as a multiple choice question and ask the model to choose an option. Prompt: Multiple choice problem: Which of the following options describes the book The Odyssey? Options: thriller sci-fi mythology biography Response: The correct answer is mythology . (gemini-2.5-flash) Change the order of prompt content: The order of the content in the prompt can sometimes affect the response. Try changing the content order and see how that affects the response. Version 1 : [ examples ] [ context ] [ input ] Version 2 : [ input ] [ examples ] [ context ] Version 3 : [ examples ] [ input ] [ context ] Fallback responses A fallback response is a response returned by the model when either the prompt or the response triggers a safety filter. An example of a fallback response is "I'm not able to help with that, as I'm only a language model." If the model responds with a fallback response, try increasing the temperature. Things to avoid Avoid relying on models to generate factual information. Use with care on math and logic problems. Generative models under the hood This section aims to answer the question - Is there randomness in generative models' responses, or are they deterministic? The short answer - yes to both. When you prompt a generative model, a text response is generated in two stages. In the first stage, the generative model processes the input prompt and generates a probability distribution over possible tokens (words) that are likely to come next. For example, if you prompt with the input text "The dog jumped over the ... ", the generative model will produce an array of probable next words: [("fence", 0.77), ("ledge", 0.12), ("blanket", 0.03), ...] This process is deterministic; a generative model will produce this same distribution every time it's input the same prompt text. In the second stage, the generative model converts these distributions into actual text responses through one of several decoding strategies. A simple decoding strategy might select the most likely token at every timestep. This process would always be deterministic. However, you could instead choose to generate a response by randomly sampling over the distribution returned by the model. This process would be stochastic (random). Control the degree of randomness allowed in this decoding process by setting the temperature. A temperature of 0 means only the most likely tokens are selected, and there's no randomness. Conversely, a high temperature injects a high degree of randomness into the tokens selected by the model, leading to more unexpected, surprising model responses. Next steps Now that you have a deeper understanding of prompt design, try writing your own prompts using Google AI Studio . To learn about multimodal prompting, see Prompting with media files . To learn about image prompting, see the Imagen prompt guide To learn about video prompting, see the Veo prompt guide Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-04-28 UTC.
|