arshjaved commited on
Commit
17c63c5
·
verified ·
1 Parent(s): 61e803c

Add files using upload-large-folder tool

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. text_content/docs_api-key_4f6972b4.txt +1 -1
  2. text_content/docs_api-versions_6bd15f7f.txt +5 -0
  3. text_content/docs_audio_6b3412bf.txt +1 -1
  4. text_content/docs_audio_ce171fe5.txt +1 -1
  5. text_content/docs_available-regions_f1ec4e93.txt +1 -1
  6. text_content/docs_batch-api_106cd8d4.txt +5 -0
  7. text_content/docs_batch-api_8dc15f59.txt +5 -0
  8. text_content/docs_batch-api_94dee6bb.txt +5 -0
  9. text_content/docs_batch-api_cbe479b9.txt +5 -0
  10. text_content/docs_batch-mode_27b99003.txt +2 -2
  11. text_content/docs_batch-mode_d49d3e09.txt +2 -2
  12. text_content/docs_billing_55ad0493.txt +1 -1
  13. text_content/docs_billing_932c132c.txt +1 -1
  14. text_content/docs_billing_c19877aa.txt +1 -1
  15. text_content/docs_code-execution_61a8afb4.txt +1 -1
  16. text_content/docs_document-processing_793c51fa.txt +1 -1
  17. text_content/docs_downloads_a1919083.txt +1 -1
  18. text_content/docs_downloads_a8fc2042.txt +1 -1
  19. text_content/docs_embeddings_2bb4dd63.txt +5 -0
  20. text_content/docs_embeddings_d8be8095.txt +5 -0
  21. text_content/docs_files_1488114d.txt +1 -1
  22. text_content/docs_files_1a591f45.txt +1 -1
  23. text_content/docs_files_2dd3ac4c.txt +1 -1
  24. text_content/docs_files_97554b81.txt +1 -1
  25. text_content/docs_function-calling_1c69a1f6.txt +1 -1
  26. text_content/docs_function-calling_8330f2c7.txt +1 -1
  27. text_content/docs_function-calling_cd2b43ee.txt +1 -1
  28. text_content/docs_function-calling_dbd56c28.txt +1 -1
  29. text_content/docs_function-calling_e09d4451.txt +1 -1
  30. text_content/docs_google-search_5a931e9e.txt +1 -1
  31. text_content/docs_grounding_25ac83ea.txt +1 -1
  32. text_content/docs_image-generation_45e977a9.txt +5 -0
  33. text_content/docs_image-generation_46bf3e9e.txt +2 -2
  34. text_content/docs_image-generation_56ff9b7f.txt +2 -2
  35. text_content/docs_image-generation_94c9fcfa.txt +5 -0
  36. text_content/docs_image-understanding_3665b2dc.txt +1 -1
  37. text_content/docs_image-understanding_f3fc24eb.txt +1 -1
  38. text_content/docs_libraries_c54e7094.txt +1 -1
  39. text_content/docs_live-guide_51ee6d26.txt +1 -1
  40. text_content/docs_live-session_21ef6dd5.txt +1 -1
  41. text_content/docs_live-session_910f1837.txt +1 -1
  42. text_content/docs_live-tools_141b2194.txt +1 -1
  43. text_content/docs_live-tools_e7e6ec29.txt +1 -1
  44. text_content/docs_live_1b8c48ea.txt +1 -1
  45. text_content/docs_migrate_8812abe1.txt +1 -1
  46. text_content/docs_model-tuning_41c0a534.txt +1 -1
  47. text_content/docs_models_110171ae.txt +5 -0
  48. text_content/docs_models_1a9a1694.txt +1 -1
  49. text_content/docs_models_302dc48c.txt +1 -1
  50. text_content/docs_models_4b13d9ba.txt +1 -1
text_content/docs_api-key_4f6972b4.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/api-key#provide-api-key-explicitly
2
  Title: Using Gemini API keys  |  Google AI for Developers
3
  ==================================================
4
 
5
- Using Gemini API keys | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Using Gemini API keys To use the Gemini API, you need an API key. You can create a key for free with a few clicks in Google AI Studio . Once you have an API key, you have the following options to connect to the Gemini API: Setting your API key as an environment variable Providing your API key explicitly For initial testing, you can hard code an API key, but this should only be temporary since it's not secure. You can find examples for hard coding the API key in Providing API key explicitly section. Setting API key as environment variable If you set the environment variable GEMINI_API_KEY or GOOGLE_API_KEY , the API key will automatically be picked up by the client when using one of the Gemini API libraries . It's recommended that you set only one of those variables, but if both are set, GOOGLE_API_KEY takes precedence. If you're using the REST API, or JavaScript on the browser, you will need to provide the API key explicitly. Here is how you can set your API key locally as the environment variable GEMINI_API_KEY with different operating systems. Linux/macOS - Bash Bash is a common Linux and macOS terminal configuration. You can check if you have a configuration file for it by running the following command: ~/.bashrc If the response is "No such file or directory", you will need to create this file and open it by running the following commands, or use zsh : touch ~/.bashrc open ~/.bashrc Next, you need to set your API key by adding the following export command: export GEMINI_API_KEY = <YOUR_API_KEY_HERE> After saving the file, apply the changes by running: source ~/.bashrc macOS - Zsh Zsh is a common Linux and macOS terminal configuration. You can check if you have a configuration file for it by running the following command: ~/.zshrc If the response is "No such file or directory", you will need to create this file and open it by running the following commands, or use bash : touch ~/.zshrc open ~/.zshrc Next, you need to set your API key by adding the following export command: export GEMINI_API_KEY = <YOUR_API_KEY_HERE> After saving the file, apply the changes by running: source ~/.zshrc Windows Search for "Environment Variables" in the system settings Edit either "User variables" (for current user) or "System variables" (for all users - use with caution). Create the variable and add export GEMINI_API_KEY=your_key_here Apply the changes Providing API key explicitly In some cases, you may want to explicitly provide an API key. For example: You're doing a simple API call and prefer hard coding the API key. You want explicit control without having to rely on automatic discovery of environment variables by the Gemini API libraries You're using an environment where environment variables are not supported (e.g web) or you are making REST calls. Below are examples for how you can provide an API key explicitly: Python from google import genai client = genai . Client ( api_key = " YOUR_API_KEY " ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Explain how AI works in a few words" ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({ apiKey : " YOUR_API_KEY " }); async function main () { const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : "Explain how AI works in a few words" , }); console . log ( response . text ); } main (); Go package main import ( "context" "fmt" "log" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , & genai . ClientConfig { APIKey : " YOUR_API_KEY " , Backend : genai . BackendGeminiAPI , }) if err != nil { log . Fatal ( err ) } result , err := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , genai . Text ( "Explain how AI works in a few words" ), nil , ) if err != nil { log . Fatal ( err ) } fmt . Println ( result . Text ()) } Java package com.example ; import com.google.genai.Client ; import com.google.genai.types.GenerateContentResponse ; public class GenerateTextFromTextInput { public static void main ( String [] args ) { Client client = Client . builder (). apiKey ( " YOUR_API_KEY " ). build (); GenerateContentResponse response = client . models . generateContent ( "gemini-2.5-flash" , "Explain how AI works in a few words" , null ); System . out . println ( response . text ()); } } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key= $ YOUR_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [ { "parts": [ { "text": "Explain how AI works in a few words" } ] } ] }' Keep your API key secure Treat your Gemini API key like a password. If compromised, others can use your project's quota, incur charges (if billing is enabled), and access your private data, such as files. Critical security rules Never commit API keys to source control. Do not check your API key into version control systems like Git. Never expose API keys on the client-side. Do not use your API key directly in web or mobile apps in production. Keys in client-side code (including our JavaScript/TypeScript libraries and REST calls) can be extracted. Best practices Use server-side calls with API keys The most secure way to use your API key is to call the Gemini API from a server-side application where the key can be kept confidential. Use ephemeral tokens for client-side access (Live API only): For direct client-side access to the Live API, you can use ephemeral tokens. They come with lower security risks and can be suitable for production use. Review ephemeral tokens guide for more information. Consider adding restrictions to your key: You can limit a key's permissions by adding API key restrictions . This minimizes the potential damage if the key is ever leaked. For some general best practices, you can also review this support article . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
 
2
  Title: Using Gemini API keys  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Using Gemini API keys | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Using Gemini API keys To use the Gemini API, you need an API key. You can create a key for free with a few clicks in Google AI Studio . Once you have an API key, you have the following options to connect to the Gemini API: Setting your API key as an environment variable Providing your API key explicitly For initial testing, you can hard code an API key, but this should only be temporary since it's not secure. You can find examples for hard coding the API key in Providing API key explicitly section. Setting the API key as an environment variable If you set the environment variable GEMINI_API_KEY or GOOGLE_API_KEY , the API key will automatically be picked up by the client when using one of the Gemini API libraries . It's recommended that you set only one of those variables, but if both are set, GOOGLE_API_KEY takes precedence. If you're using the REST API, or JavaScript on the browser, you will need to provide the API key explicitly. Here is how you can set your API key locally as the environment variable GEMINI_API_KEY with different operating systems. Linux/macOS - Bash Bash is a common Linux and macOS terminal configuration. You can check if you have a configuration file for it by running the following command: ~/.bashrc If the response is "No such file or directory", you will need to create this file and open it by running the following commands, or use zsh : touch ~/.bashrc open ~/.bashrc Next, you need to set your API key by adding the following export command: export GEMINI_API_KEY = <YOUR_API_KEY_HERE> After saving the file, apply the changes by running: source ~/.bashrc macOS - Zsh Zsh is a common Linux and macOS terminal configuration. You can check if you have a configuration file for it by running the following command: ~/.zshrc If the response is "No such file or directory", you will need to create this file and open it by running the following commands, or use bash : touch ~/.zshrc open ~/.zshrc Next, you need to set your API key by adding the following export command: export GEMINI_API_KEY = <YOUR_API_KEY_HERE> After saving the file, apply the changes by running: source ~/.zshrc Windows Search for "Environment Variables" in the search bar. Choose to modify System Settings . You may have to confirm you want to do this. In the system settings dialog, click the button labeled Environment Variables . Under either User variables (for the current user) or System variables (applies to all users who use the machine), click New... Specify the variable name as GEMINI_API_KEY . Specify your Gemini API Key as the variable value. Click OK to apply the changes. Open a new terminal session (cmd or Powershell) to get the new variable. Providing the API key explicitly In some cases, you may want to explicitly provide an API key. For example: You're doing a simple API call and prefer hard coding the API key. You want explicit control without having to rely on automatic discovery of environment variables by the Gemini API libraries You're using an environment where environment variables are not supported (e.g web) or you are making REST calls. Below are examples for how you can provide an API key explicitly: Python from google import genai client = genai . Client ( api_key = " YOUR_API_KEY " ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Explain how AI works in a few words" ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({ apiKey : " YOUR_API_KEY " }); async function main () { const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : "Explain how AI works in a few words" , }); console . log ( response . text ); } main (); Go package main import ( "context" "fmt" "log" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , & genai . ClientConfig { APIKey : " YOUR_API_KEY " , Backend : genai . BackendGeminiAPI , }) if err != nil { log . Fatal ( err ) } result , err := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , genai . Text ( "Explain how AI works in a few words" ), nil , ) if err != nil { log . Fatal ( err ) } fmt . Println ( result . Text ()) } Java package com.example ; import com.google.genai.Client ; import com.google.genai.types.GenerateContentResponse ; public class GenerateTextFromTextInput { public static void main ( String [] args ) { Client client = Client . builder (). apiKey ( " YOUR_API_KEY " ). build (); GenerateContentResponse response = client . models . generateContent ( "gemini-2.5-flash" , "Explain how AI works in a few words" , null ); System . out . println ( response . text ()); } } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H 'Content-Type: application/json' \ -H "x-goog-api-key: YOUR_API_KEY " \ -X POST \ -d '{ "contents": [ { "parts": [ { "text": "Explain how AI works in a few words" } ] } ] }' Keep your API key secure Treat your Gemini API key like a password. If compromised, others can use your project's quota, incur charges (if billing is enabled), and access your private data, such as files. Critical security rules Never commit API keys to source control. Do not check your API key into version control systems like Git. Never expose API keys on the client-side. Do not use your API key directly in web or mobile apps in production. Keys in client-side code (including our JavaScript/TypeScript libraries and REST calls) can be extracted. Best practices Use server-side calls with API keys The most secure way to use your API key is to call the Gemini API from a server-side application where the key can be kept confidential. Use ephemeral tokens for client-side access (Live API only): For direct client-side access to the Live API, you can use ephemeral tokens. They come with lower security risks and can be suitable for production use. Review ephemeral tokens guide for more information. Consider adding restrictions to your key: You can limit a key's permissions by adding API key restrictions . This minimizes the potential damage if the key is ever leaked. For some general best practices, you can also review this support article . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-09-09 UTC.
text_content/docs_api-versions_6bd15f7f.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ URL: https://ai.google.dev/gemini-api/docs/api-versions#main-content
2
+ Title: API versions explained  |  Gemini API  |  Google AI for Developers
3
+ ==================================================
4
+
5
+ API versions explained | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API API Reference Send feedback API versions explained This document provides a high-level overview of the differences between the v1 and v1beta versions of the Gemini API. v1 : Stable version of the API. Features in the stable version are fully-supported over the lifetime of the major version. If there are any breaking changes, then the next major version of the API will be created and the existing version will be deprecated after a reasonable period of time. Non-breaking changes may be introduced to the API without changing the major version. v1beta : This version includes early-access features that may be under development and is subject to rapid and breaking changes. There is also no guarantee that the features in the Beta version will move to the stable version. Due to this instability, you should consider not launching production applications with this version. Feature v1 v1beta Generate Content - Text-only input Generate Content - Text-and-image input Generate Content - Text output Generate Content - Multi-turn conversations (chat) Generate Content - Function calls Generate Content - Streaming Embed Content - Text-only input Generate Answer Semantic retriever - Supported - Will never be supported Configure API version in an SDK The Gemini API SDK's default to v1beta , but you can opt to use other versions by setting the API version as shown in the following code sample: Python from google import genai client = genai . Client ( http_options = { 'api_version' : 'v1alpha' }) response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = "Explain how AI works" , ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({ httpOptions : { apiVersion : "v1alpha" }, }); async function main () { const response = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : "Explain how AI works" , }); console . log ( response . text ); } await main (); REST curl "https://generativelanguage.googleapis.com/v1/models/gemini-1.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[{"text": "Explain how AI works."}] }] }' Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_audio_6b3412bf.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/audio
2
  Title: Audio understanding  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Audio understanding | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Audio understanding Gemini can analyze and understand audio input, enabling use cases like the following: Describe, summarize, or answer questions about audio content. Provide a transcription of the audio. Analyze specific segments of the audio. This guide shows you how to use the Gemini API to generate a text response to audio input. Before you begin Before calling the Gemini API, ensure you have your SDK of choice installed, and a Gemini API key configured and ready to use. Input audio You can provide audio data to Gemini in the following ways: Upload an audio file before making a request to generateContent . Pass inline audio data with the request to generateContent . Upload an audio file You can use the Files API to upload an audio file. Always use the Files API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20 MB. The following code uploads an audio file and then uses the file in a call to generateContent . Python from google import genai client = genai . Client () myfile = client . files . upload ( file = "path/to/sample.mp3" ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ "Describe this audio clip" , myfile ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); async function main () { const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mp3" }, }); const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Describe this audio clip" , ]), }); console . log ( response . text ); } await main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromText ( "Describe this audio clip" ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST AUDIO_PATH = "path/to/sample.mp3" MIME_TYPE = $( file -b --mime-type " ${ AUDIO_PATH } " ) NUM_BYTES = $( wc -c < " ${ AUDIO_PATH } " ) DISPLAY_NAME = AUDIO tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D upload-header.tmp \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ AUDIO_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "Describe this audio clip"}, {"file_data":{"mime_type": "${MIME_TYPE}", "file_uri": ' $file_uri '}}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json To learn more about working with media files, see Files API . Pass audio data inline Instead of uploading an audio file, you can pass inline audio data in the request to generateContent : Python from google.genai import types with open ( 'path/to/small-sample.mp3' , 'rb' ) as f : audio_bytes = f . read () response = client . models . generate_content ( model = 'gemini-2.5-flash' , contents = [ 'Describe this audio clip' , types . Part . from_bytes ( data = audio_bytes , mime_type = 'audio/mp3' , ) ] ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; import * as fs from "node:fs" ; const ai = new GoogleGenAI ({}); const base64AudioFile = fs . readFileSync ( "path/to/small-sample.mp3" , { encoding : "base64" , }); const contents = [ { text : "Please summarize the audio." }, { inlineData : { mimeType : "audio/mp3" , data : base64AudioFile , }, }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : contents , }); console . log ( response . text ); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } audioBytes , _ := os . ReadFile ( "/path/to/small-sample.mp3" ) parts := [] * genai . Part { genai . NewPartFromText ( "Describe this audio clip" ), & genai . Part { InlineData : & genai . Blob { MIMEType : "audio/mp3" , Data : audioBytes , }, }, } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } A few things to keep in mind about inline audio data: The maximum request size is 20 MB, which includes text prompts, system instructions, and files provided inline. If your file's size will make the total request size exceed 20 MB, then use the Files API to upload an audio file for use in the request. If you're using an audio sample multiple times, it's more efficient to upload an audio file . Get a transcript To get a transcript of audio data, just ask for it in the prompt: Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) prompt = 'Generate a transcript of the speech.' response = client . models . generate_content ( model = 'gemini-2.5-flash' , contents = [ prompt , myfile ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Generate a transcript of the speech." , ]), }); console . log ( "result.text=" , result . text ); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromText ( "Generate a transcript of the speech." ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } Refer to timestamps You can refer to specific sections of an audio file using timestamps of the form MM:SS . For example, the following prompt requests a transcript that Starts at 2 minutes 30 seconds from the beginning of the file. Ends at 3 minutes 29 seconds from the beginning of the file. Python # Create a prompt containing timestamps. prompt = "Provide a transcript of the speech from 02:30 to 03:29." JavaScript // Create a prompt containing timestamps. const prompt = "Provide a transcript of the speech from 02:30 to 03:29." Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromText ( "Provide a transcript of the speech " + "between the timestamps 02:30 and 03:29." ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } Count tokens Call the countTokens method to get a count of the number of tokens in an audio file. For example: Python response = client . models . count_tokens ( model = 'gemini-2.5-flash' , contents = [ myfile ] ) print ( response ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const countTokensResponse = await ai . models . countTokens ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), ]), }); console . log ( countTokensResponse . totalTokens ); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } tokens , _ := client . Models . CountTokens ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Printf ( "File %s is %d tokens\n" , localAudioPath , tokens . TotalTokens ) } Supported audio formats Gemini supports the following audio format MIME types: WAV - audio/wav MP3 - audio/mp3 AIFF - audio/aiff AAC - audio/aac OGG Vorbis - audio/ogg FLAC - audio/flac Technical details about audio Gemini represents each second of audio as 32 tokens; for example, one minute of audio is represented as 1,920 tokens. Gemini can "understand" non-speech components, such as birdsong or sirens. The maximum supported length of audio data in a single prompt is 9.5 hours. Gemini doesn't limit the number of audio files in a single prompt; however, the total combined length of all audio files in a single prompt can't exceed 9.5 hours. Gemini downsamples audio files to a 16 Kbps data resolution. If the audio source contains multiple channels, Gemini combines those channels into a single channel. What's next This guide shows how to generate text in response to audio data. To learn more, see the following resources: File prompting strategies : The Gemini API supports prompting with text, image, audio, and video data, also known as multimodal prompting. System instructions : System instructions let you steer the behavior of the model based on your specific needs and use cases. Safety guidance : Sometimes generative AI models produce unexpected outputs, such as outputs that are inaccurate, biased, or offensive. Post-processing and human evaluation are essential to limit the risk of harm from such outputs. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
 
2
  Title: Audio understanding  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Audio understanding | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Audio understanding Gemini can analyze and understand audio input, enabling use cases like the following: Describe, summarize, or answer questions about audio content. Provide a transcription of the audio. Analyze specific segments of the audio. This guide shows you how to use the Gemini API to generate a text response to audio input. Before you begin Before calling the Gemini API, ensure you have your SDK of choice installed, and a Gemini API key configured and ready to use. Input audio You can provide audio data to Gemini in the following ways: Upload an audio file before making a request to generateContent . Pass inline audio data with the request to generateContent . Upload an audio file You can use the Files API to upload an audio file. Always use the Files API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20 MB. The following code uploads an audio file and then uses the file in a call to generateContent . Python from google import genai client = genai . Client () myfile = client . files . upload ( file = "path/to/sample.mp3" ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ "Describe this audio clip" , myfile ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); async function main () { const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mp3" }, }); const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Describe this audio clip" , ]), }); console . log ( response . text ); } await main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromText ( "Describe this audio clip" ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST AUDIO_PATH = "path/to/sample.mp3" MIME_TYPE = $( file -b --mime-type " ${ AUDIO_PATH } " ) NUM_BYTES = $( wc -c < " ${ AUDIO_PATH } " ) DISPLAY_NAME = AUDIO tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D upload-header.tmp \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ AUDIO_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "Describe this audio clip"}, {"file_data":{"mime_type": "${MIME_TYPE}", "file_uri": ' $file_uri '}}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json To learn more about working with media files, see Files API . Pass audio data inline Instead of uploading an audio file, you can pass inline audio data in the request to generateContent : Python from google.genai import types with open ( 'path/to/small-sample.mp3' , 'rb' ) as f : audio_bytes = f . read () response = client . models . generate_content ( model = 'gemini-2.5-flash' , contents = [ 'Describe this audio clip' , types . Part . from_bytes ( data = audio_bytes , mime_type = 'audio/mp3' , ) ] ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; import * as fs from "node:fs" ; const ai = new GoogleGenAI ({}); const base64AudioFile = fs . readFileSync ( "path/to/small-sample.mp3" , { encoding : "base64" , }); const contents = [ { text : "Please summarize the audio." }, { inlineData : { mimeType : "audio/mp3" , data : base64AudioFile , }, }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : contents , }); console . log ( response . text ); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } audioBytes , _ := os . ReadFile ( "/path/to/small-sample.mp3" ) parts := [] * genai . Part { genai . NewPartFromText ( "Describe this audio clip" ), & genai . Part { InlineData : & genai . Blob { MIMEType : "audio/mp3" , Data : audioBytes , }, }, } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } A few things to keep in mind about inline audio data: The maximum request size is 20 MB, which includes text prompts, system instructions, and files provided inline. If your file's size will make the total request size exceed 20 MB, then use the Files API to upload an audio file for use in the request. If you're using an audio sample multiple times, it's more efficient to upload an audio file . Get a transcript To get a transcript of audio data, just ask for it in the prompt: Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) prompt = 'Generate a transcript of the speech.' response = client . models . generate_content ( model = 'gemini-2.5-flash' , contents = [ prompt , myfile ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Generate a transcript of the speech." , ]), }); console . log ( "result.text=" , result . text ); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromText ( "Generate a transcript of the speech." ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } Refer to timestamps You can refer to specific sections of an audio file using timestamps of the form MM:SS . For example, the following prompt requests a transcript that Starts at 2 minutes 30 seconds from the beginning of the file. Ends at 3 minutes 29 seconds from the beginning of the file. Python # Create a prompt containing timestamps. prompt = "Provide a transcript of the speech from 02:30 to 03:29." JavaScript // Create a prompt containing timestamps. const prompt = "Provide a transcript of the speech from 02:30 to 03:29." Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromText ( "Provide a transcript of the speech " + "between the timestamps 02:30 and 03:29." ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } Count tokens Call the countTokens method to get a count of the number of tokens in an audio file. For example: Python response = client . models . count_tokens ( model = 'gemini-2.5-flash' , contents = [ myfile ] ) print ( response ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const countTokensResponse = await ai . models . countTokens ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), ]), }); console . log ( countTokensResponse . totalTokens ); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } tokens , _ := client . Models . CountTokens ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Printf ( "File %s is %d tokens\n" , localAudioPath , tokens . TotalTokens ) } Supported audio formats Gemini supports the following audio format MIME types: WAV - audio/wav MP3 - audio/mp3 AIFF - audio/aiff AAC - audio/aac OGG Vorbis - audio/ogg FLAC - audio/flac Technical details about audio Gemini represents each second of audio as 32 tokens; for example, one minute of audio is represented as 1,920 tokens. Gemini can "understand" non-speech components, such as birdsong or sirens. The maximum supported length of audio data in a single prompt is 9.5 hours. Gemini doesn't limit the number of audio files in a single prompt; however, the total combined length of all audio files in a single prompt can't exceed 9.5 hours. Gemini downsamples audio files to a 16 Kbps data resolution. If the audio source contains multiple channels, Gemini combines those channels into a single channel. What's next This guide shows how to generate text in response to audio data. To learn more, see the following resources: File prompting strategies : The Gemini API supports prompting with text, image, audio, and video data, also known as multimodal prompting. System instructions : System instructions let you steer the behavior of the model based on your specific needs and use cases. Safety guidance : Sometimes generative AI models produce unexpected outputs, such as outputs that are inaccurate, biased, or offensive. Post-processing and human evaluation are essential to limit the risk of harm from such outputs. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_audio_ce171fe5.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/audio#upload-audio
2
  Title: Audio understanding  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Audio understanding | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Audio understanding Gemini can analyze and understand audio input, enabling use cases like the following: Describe, summarize, or answer questions about audio content. Provide a transcription of the audio. Analyze specific segments of the audio. This guide shows you how to use the Gemini API to generate a text response to audio input. Before you begin Before calling the Gemini API, ensure you have your SDK of choice installed, and a Gemini API key configured and ready to use. Input audio You can provide audio data to Gemini in the following ways: Upload an audio file before making a request to generateContent . Pass inline audio data with the request to generateContent . Upload an audio file You can use the Files API to upload an audio file. Always use the Files API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20 MB. The following code uploads an audio file and then uses the file in a call to generateContent . Python from google import genai client = genai . Client () myfile = client . files . upload ( file = "path/to/sample.mp3" ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ "Describe this audio clip" , myfile ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); async function main () { const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mp3" }, }); const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Describe this audio clip" , ]), }); console . log ( response . text ); } await main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromText ( "Describe this audio clip" ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST AUDIO_PATH = "path/to/sample.mp3" MIME_TYPE = $( file -b --mime-type " ${ AUDIO_PATH } " ) NUM_BYTES = $( wc -c < " ${ AUDIO_PATH } " ) DISPLAY_NAME = AUDIO tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D upload-header.tmp \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ AUDIO_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "Describe this audio clip"}, {"file_data":{"mime_type": "${MIME_TYPE}", "file_uri": ' $file_uri '}}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json To learn more about working with media files, see Files API . Pass audio data inline Instead of uploading an audio file, you can pass inline audio data in the request to generateContent : Python from google.genai import types with open ( 'path/to/small-sample.mp3' , 'rb' ) as f : audio_bytes = f . read () response = client . models . generate_content ( model = 'gemini-2.5-flash' , contents = [ 'Describe this audio clip' , types . Part . from_bytes ( data = audio_bytes , mime_type = 'audio/mp3' , ) ] ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; import * as fs from "node:fs" ; const ai = new GoogleGenAI ({}); const base64AudioFile = fs . readFileSync ( "path/to/small-sample.mp3" , { encoding : "base64" , }); const contents = [ { text : "Please summarize the audio." }, { inlineData : { mimeType : "audio/mp3" , data : base64AudioFile , }, }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : contents , }); console . log ( response . text ); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } audioBytes , _ := os . ReadFile ( "/path/to/small-sample.mp3" ) parts := [] * genai . Part { genai . NewPartFromText ( "Describe this audio clip" ), & genai . Part { InlineData : & genai . Blob { MIMEType : "audio/mp3" , Data : audioBytes , }, }, } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } A few things to keep in mind about inline audio data: The maximum request size is 20 MB, which includes text prompts, system instructions, and files provided inline. If your file's size will make the total request size exceed 20 MB, then use the Files API to upload an audio file for use in the request. If you're using an audio sample multiple times, it's more efficient to upload an audio file . Get a transcript To get a transcript of audio data, just ask for it in the prompt: Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) prompt = 'Generate a transcript of the speech.' response = client . models . generate_content ( model = 'gemini-2.5-flash' , contents = [ prompt , myfile ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Generate a transcript of the speech." , ]), }); console . log ( "result.text=" , result . text ); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromText ( "Generate a transcript of the speech." ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } Refer to timestamps You can refer to specific sections of an audio file using timestamps of the form MM:SS . For example, the following prompt requests a transcript that Starts at 2 minutes 30 seconds from the beginning of the file. Ends at 3 minutes 29 seconds from the beginning of the file. Python # Create a prompt containing timestamps. prompt = "Provide a transcript of the speech from 02:30 to 03:29." JavaScript // Create a prompt containing timestamps. const prompt = "Provide a transcript of the speech from 02:30 to 03:29." Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromText ( "Provide a transcript of the speech " + "between the timestamps 02:30 and 03:29." ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } Count tokens Call the countTokens method to get a count of the number of tokens in an audio file. For example: Python response = client . models . count_tokens ( model = 'gemini-2.5-flash' , contents = [ myfile ] ) print ( response ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const countTokensResponse = await ai . models . countTokens ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), ]), }); console . log ( countTokensResponse . totalTokens ); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } tokens , _ := client . Models . CountTokens ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Printf ( "File %s is %d tokens\n" , localAudioPath , tokens . TotalTokens ) } Supported audio formats Gemini supports the following audio format MIME types: WAV - audio/wav MP3 - audio/mp3 AIFF - audio/aiff AAC - audio/aac OGG Vorbis - audio/ogg FLAC - audio/flac Technical details about audio Gemini represents each second of audio as 32 tokens; for example, one minute of audio is represented as 1,920 tokens. Gemini can "understand" non-speech components, such as birdsong or sirens. The maximum supported length of audio data in a single prompt is 9.5 hours. Gemini doesn't limit the number of audio files in a single prompt; however, the total combined length of all audio files in a single prompt can't exceed 9.5 hours. Gemini downsamples audio files to a 16 Kbps data resolution. If the audio source contains multiple channels, Gemini combines those channels into a single channel. What's next This guide shows how to generate text in response to audio data. To learn more, see the following resources: File prompting strategies : The Gemini API supports prompting with text, image, audio, and video data, also known as multimodal prompting. System instructions : System instructions let you steer the behavior of the model based on your specific needs and use cases. Safety guidance : Sometimes generative AI models produce unexpected outputs, such as outputs that are inaccurate, biased, or offensive. Post-processing and human evaluation are essential to limit the risk of harm from such outputs. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
 
2
  Title: Audio understanding  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Audio understanding | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Audio understanding Gemini can analyze and understand audio input, enabling use cases like the following: Describe, summarize, or answer questions about audio content. Provide a transcription of the audio. Analyze specific segments of the audio. This guide shows you how to use the Gemini API to generate a text response to audio input. Before you begin Before calling the Gemini API, ensure you have your SDK of choice installed, and a Gemini API key configured and ready to use. Input audio You can provide audio data to Gemini in the following ways: Upload an audio file before making a request to generateContent . Pass inline audio data with the request to generateContent . Upload an audio file You can use the Files API to upload an audio file. Always use the Files API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20 MB. The following code uploads an audio file and then uses the file in a call to generateContent . Python from google import genai client = genai . Client () myfile = client . files . upload ( file = "path/to/sample.mp3" ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ "Describe this audio clip" , myfile ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); async function main () { const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mp3" }, }); const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Describe this audio clip" , ]), }); console . log ( response . text ); } await main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromText ( "Describe this audio clip" ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST AUDIO_PATH = "path/to/sample.mp3" MIME_TYPE = $( file -b --mime-type " ${ AUDIO_PATH } " ) NUM_BYTES = $( wc -c < " ${ AUDIO_PATH } " ) DISPLAY_NAME = AUDIO tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D upload-header.tmp \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ AUDIO_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "Describe this audio clip"}, {"file_data":{"mime_type": "${MIME_TYPE}", "file_uri": ' $file_uri '}}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json To learn more about working with media files, see Files API . Pass audio data inline Instead of uploading an audio file, you can pass inline audio data in the request to generateContent : Python from google.genai import types with open ( 'path/to/small-sample.mp3' , 'rb' ) as f : audio_bytes = f . read () response = client . models . generate_content ( model = 'gemini-2.5-flash' , contents = [ 'Describe this audio clip' , types . Part . from_bytes ( data = audio_bytes , mime_type = 'audio/mp3' , ) ] ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; import * as fs from "node:fs" ; const ai = new GoogleGenAI ({}); const base64AudioFile = fs . readFileSync ( "path/to/small-sample.mp3" , { encoding : "base64" , }); const contents = [ { text : "Please summarize the audio." }, { inlineData : { mimeType : "audio/mp3" , data : base64AudioFile , }, }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : contents , }); console . log ( response . text ); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } audioBytes , _ := os . ReadFile ( "/path/to/small-sample.mp3" ) parts := [] * genai . Part { genai . NewPartFromText ( "Describe this audio clip" ), & genai . Part { InlineData : & genai . Blob { MIMEType : "audio/mp3" , Data : audioBytes , }, }, } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } A few things to keep in mind about inline audio data: The maximum request size is 20 MB, which includes text prompts, system instructions, and files provided inline. If your file's size will make the total request size exceed 20 MB, then use the Files API to upload an audio file for use in the request. If you're using an audio sample multiple times, it's more efficient to upload an audio file . Get a transcript To get a transcript of audio data, just ask for it in the prompt: Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) prompt = 'Generate a transcript of the speech.' response = client . models . generate_content ( model = 'gemini-2.5-flash' , contents = [ prompt , myfile ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Generate a transcript of the speech." , ]), }); console . log ( "result.text=" , result . text ); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromText ( "Generate a transcript of the speech." ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } Refer to timestamps You can refer to specific sections of an audio file using timestamps of the form MM:SS . For example, the following prompt requests a transcript that Starts at 2 minutes 30 seconds from the beginning of the file. Ends at 3 minutes 29 seconds from the beginning of the file. Python # Create a prompt containing timestamps. prompt = "Provide a transcript of the speech from 02:30 to 03:29." JavaScript // Create a prompt containing timestamps. const prompt = "Provide a transcript of the speech from 02:30 to 03:29." Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromText ( "Provide a transcript of the speech " + "between the timestamps 02:30 and 03:29." ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } Count tokens Call the countTokens method to get a count of the number of tokens in an audio file. For example: Python response = client . models . count_tokens ( model = 'gemini-2.5-flash' , contents = [ myfile ] ) print ( response ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const countTokensResponse = await ai . models . countTokens ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), ]), }); console . log ( countTokensResponse . totalTokens ); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } localAudioPath := "/path/to/sample.mp3" uploadedFile , _ := client . Files . UploadFromPath ( ctx , localAudioPath , nil , ) parts := [] * genai . Part { genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } tokens , _ := client . Models . CountTokens ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Printf ( "File %s is %d tokens\n" , localAudioPath , tokens . TotalTokens ) } Supported audio formats Gemini supports the following audio format MIME types: WAV - audio/wav MP3 - audio/mp3 AIFF - audio/aiff AAC - audio/aac OGG Vorbis - audio/ogg FLAC - audio/flac Technical details about audio Gemini represents each second of audio as 32 tokens; for example, one minute of audio is represented as 1,920 tokens. Gemini can "understand" non-speech components, such as birdsong or sirens. The maximum supported length of audio data in a single prompt is 9.5 hours. Gemini doesn't limit the number of audio files in a single prompt; however, the total combined length of all audio files in a single prompt can't exceed 9.5 hours. Gemini downsamples audio files to a 16 Kbps data resolution. If the audio source contains multiple channels, Gemini combines those channels into a single channel. What's next This guide shows how to generate text in response to audio data. To learn more, see the following resources: File prompting strategies : The Gemini API supports prompting with text, image, audio, and video data, also known as multimodal prompting. System instructions : System instructions let you steer the behavior of the model based on your specific needs and use cases. Safety guidance : Sometimes generative AI models produce unexpected outputs, such as outputs that are inaccurate, biased, or offensive. Post-processing and human evaluation are essential to limit the risk of harm from such outputs. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_available-regions_f1ec4e93.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/available-regions
2
  Title: Available regions for Google AI Studio and Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Available regions for Google AI Studio and Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Available regions for Google AI Studio and Gemini API If you reached this page after trying to open Google AI Studio , it may be because Google AI Studio is not available in your region, or you don't meet the age requirements (18+) for access. You can learn more about the available regions in the following section and other requirements in the terms of service . Available regions Note: For Colab users - Region restrictions are applied based on the region that the Colab instance is in, not the region that the user is in. You can check the location of the Colab instance using !curl ipinfo.io The Gemini API and Google AI Studio are available in the following countries and territories. If you're not in one of these countries or territories, try the Gemini API in Vertex AI : Albania Algeria American Samoa Angola Anguilla Antarctica Antigua and Barbuda Argentina Armenia Aruba Australia Austria Azerbaijan The Bahamas Bahrain Bangladesh Barbados Belgium Belize Benin Bermuda Bhutan Bolivia Bosnia Botswana Brazil British Indian Ocean Territory British Virgin Islands Brunei Bulgaria Burkina Faso Burundi Cabo Verde Cambodia Cameroon Canada Caribbean Netherlands Cayman Islands Central African Republic Chad Chile Christmas Island Cocos (Keeling) Islands Colombia Comoros Cook Islands Côte d'Ivoire Costa Rica Croatia Curaçao Czech Republic Democratic Republic of the Congo Denmark Djibouti Dominica Dominican Republic Ecuador Egypt El Salvador Equatorial Guinea Eritrea Estonia Eswatini Ethiopia Falkland Islands (Islas Malvinas) Faroe Islands Fiji Finland France Gabon The Gambia Georgia Germany Ghana Gibraltar Greece Greenland Grenada Guam Guatemala Guernsey Guinea Guinea-Bissau Guyana Haiti Heard Island and McDonald Islands Herzegovina Honduras Hungary Iceland India Indonesia Iraq Ireland Isle of Man Israel Italy Jamaica Japan Jersey Jordan Kazakhstan Kenya Kiribati Kosovo Kyrgyzstan Kuwait Laos Latvia Lebanon Lesotho Liberia Libya Liechtenstein Lithuania Luxembourg Madagascar Malawi Malaysia Maldives Mali Malta Marshall Islands Mauritania Mauritius Mexico Micronesia Mongolia Montenegro Montserrat Morocco Mozambique Namibia Nauru Nepal Netherlands New Caledonia New Zealand Nicaragua Niger Nigeria Niue Norfolk Island North Macedonia Northern Mariana Islands Norway Oman Pakistan Palau Palestine Panama Papua New Guinea Paraguay Peru Philippines Pitcairn Islands Poland Portugal Puerto Rico Qatar Republic of Cyprus Republic of the Congo Romania Rwanda Saint Barthélemy Saint Kitts and Nevis Saint Lucia Saint Pierre and Miquelon Saint Vincent and the Grenadines Saint Helena, Ascension and Tristan da Cunha Samoa São Tomé and Príncipe Saudi Arabia Senegal Serbia Seychelles Sierra Leone Singapore Slovakia Slovenia Solomon Islands Somalia South Africa South Georgia and the South Sandwich Islands South Korea South Sudan Spain Sri Lanka Sudan Suriname Sweden Switzerland Taiwan Tajikistan Tanzania Thailand Timor-Leste Togo Tokelau Tonga Trinidad and Tobago Tunisia Türkiye Turkmenistan Turks and Caicos Islands Tuvalu Uganda Ukraine United Kingdom United Arab Emirates United States United States Minor Outlying Islands U.S. Virgin Islands Uruguay Uzbekistan Vanuatu Venezuela Vietnam Wallis and Futuna Western Sahara Yemen Zambia Zimbabwe Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-02-25 UTC.
 
2
  Title: Available regions for Google AI Studio and Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Available regions for Google AI Studio and Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Available regions for Google AI Studio and Gemini API If you reached this page after trying to open Google AI Studio , it may be because Google AI Studio is not available in your region, or you don't meet the age requirements (18+) for access. You can learn more about the available regions in the following section and other requirements in the terms of service . Available regions Note: For Colab users - Region restrictions are applied based on the region that the Colab instance is in, not the region that the user is in. You can check the location of the Colab instance using !curl ipinfo.io The Gemini API and Google AI Studio are available in the following countries and territories. If you're not in one of these countries or territories, try the Gemini API in Vertex AI : Albania Algeria American Samoa Angola Anguilla Antarctica Antigua and Barbuda Argentina Armenia Aruba Australia Austria Azerbaijan The Bahamas Bahrain Bangladesh Barbados Belgium Belize Benin Bermuda Bhutan Bolivia Bosnia Botswana Brazil British Indian Ocean Territory British Virgin Islands Brunei Bulgaria Burkina Faso Burundi Cabo Verde Cambodia Cameroon Canada Caribbean Netherlands Cayman Islands Central African Republic Chad Chile Christmas Island Cocos (Keeling) Islands Colombia Comoros Cook Islands Côte d'Ivoire Costa Rica Croatia Curaçao Czech Republic Democratic Republic of the Congo Denmark Djibouti Dominica Dominican Republic Ecuador Egypt El Salvador Equatorial Guinea Eritrea Estonia Eswatini Ethiopia Falkland Islands (Islas Malvinas) Faroe Islands Fiji Finland France Gabon The Gambia Georgia Germany Ghana Gibraltar Greece Greenland Grenada Guam Guatemala Guernsey Guinea Guinea-Bissau Guyana Haiti Heard Island and McDonald Islands Herzegovina Honduras Hungary Iceland India Indonesia Iraq Ireland Isle of Man Israel Italy Jamaica Japan Jersey Jordan Kazakhstan Kenya Kiribati Kosovo Kyrgyzstan Kuwait Laos Latvia Lebanon Lesotho Liberia Libya Liechtenstein Lithuania Luxembourg Madagascar Malawi Malaysia Maldives Mali Malta Marshall Islands Mauritania Mauritius Mexico Micronesia Mongolia Montenegro Montserrat Morocco Mozambique Namibia Nauru Nepal Netherlands New Caledonia New Zealand Nicaragua Niger Nigeria Niue Norfolk Island North Macedonia Northern Mariana Islands Norway Oman Pakistan Palau Palestine Panama Papua New Guinea Paraguay Peru Philippines Pitcairn Islands Poland Portugal Puerto Rico Qatar Republic of Cyprus Republic of the Congo Romania Rwanda Saint Barthélemy Saint Kitts and Nevis Saint Lucia Saint Pierre and Miquelon Saint Vincent and the Grenadines Saint Helena, Ascension and Tristan da Cunha Samoa São Tomé and Príncipe Saudi Arabia Senegal Serbia Seychelles Sierra Leone Singapore Slovakia Slovenia Solomon Islands Somalia South Africa South Georgia and the South Sandwich Islands South Korea South Sudan Spain Sri Lanka Sudan Suriname Sweden Switzerland Taiwan Tajikistan Tanzania Thailand Timor-Leste Togo Tokelau Tonga Trinidad and Tobago Tunisia Türkiye Turkmenistan Turks and Caicos Islands Tuvalu Uganda Ukraine United Kingdom United Arab Emirates United States United States Minor Outlying Islands U.S. Virgin Islands Uruguay Uzbekistan Vanuatu Venezuela Vietnam Wallis and Futuna Western Sahara Yemen Zambia Zimbabwe Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_batch-api_106cd8d4.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ URL: https://ai.google.dev/gemini-api/docs/batch-api#batch-embeddings
2
+ Title: Batch API  |  Gemini API  |  Google AI for Developers
3
+ ==================================================
4
+
5
+ Batch API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Batch API The Gemini Batch API is designed to process large volumes of requests asynchronously at 50% of the standard cost . The target turnaround time is 24 hours, but in majority of cases, it is much quicker. Use Batch API for large-scale, non-urgent tasks such as data pre-processing or running evaluations where an immediate response is not required. Creating a batch job You have two ways to submit your requests in Batch API: Inline Requests : A list of GenerateContentRequest objects directly included in your batch creation request. This is suitable for smaller batches that keep the total request size under 20MB. The output returned from the model is a list of inlineResponse objects. Input File : A JSON Lines (JSONL) file where each line contains a complete GenerateContentRequest object. This method is recommended for larger requests. The output returned from the model is a JSONL file where each line is either a GenerateContentResponse or a status object. Inline requests For a small number of requests, you can directly embed the GenerateContentRequest objects within your BatchGenerateContentRequest . The following example calls the BatchGenerateContent method with inline requests: Python from google import genai from google.genai import types client = genai . Client () # A list of dictionaries, where each is a GenerateContentRequest inline_requests = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Tell me a one-sentence joke.' }], 'role' : 'user' }] }, { 'contents' : [{ 'parts' : [{ 'text' : 'Why is the sky blue?' }], 'role' : 'user' }] } ] inline_batch_job = client . batches . create ( model = "models/gemini-2.5-flash" , src = inline_requests , config = { 'display_name' : "inlined-requests-job-1" , }, ) print ( f "Created batch job: { inline_batch_job . name } " ) JavaScript import { GoogleGenAI } from '@google/genai' ; const GEMINI_API_KEY = process . env . GEMINI_API_KEY ; const ai = new GoogleGenAI ({ apiKey : GEMINI_API_KEY }); const inlinedRequests = [ { contents : [{ parts : [{ text : 'Tell me a one-sentence joke.' }], role : 'user' }] }, { contents : [{ parts : [{ 'text' : 'Why is the sky blue?' }], role : 'user' }] } ] const response = await ai . batches . create ({ model : 'gemini-2.5-flash' , src : inlinedRequests , config : { displayName : 'inlined-requests-job-1' , } }); console . log ( response ); REST curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -X POST \ -H "Content-Type:application/json" \ -d '{ "batch": { "display_name": "my-batch-requests", "input_config": { "requests": { "requests": [ { "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]}, "metadata": { "key": "request-1" } }, { "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]}, "metadata": { "key": "request-2" } } ] } } } }' Input file For larger sets of requests, prepare a JSON Lines (JSONL) file. Each line in this file must be a JSON object containing a user-defined key and a request object, where the request is a valid GenerateContentRequest object. The user-defined key is used in the response to indicate which output is the result of which request. For example, the request with the key defined as request-1 will have its response annotated with the same key name. This file is uploaded using the File API . The maximum allowed file size for an input file is 2GB. The following is an example of a JSONL file. You can save it in a file named my-batch-requests.json : { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }]}], "generation_config" : { "temperature" : 0.7 }}} { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }]}]}} Similarly to inline requests, you can specify other parameters like system instructions, tools or other configurations in each request JSON. You can upload this file using the File API as shown in the following example. If you are working with multimodal input, you can reference other uploaded files within your JSONL file. Python from google import genai from google.genai import types client = genai . Client () # Create a sample JSONL file with open ( "my-batch-requests.jsonl" , "w" ) as f : requests = [ { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }]}]}}, { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }]}]}} ] for req in requests : f . write ( json . dumps ( req ) + " \n " ) # Upload the file to the File API uploaded_file = client . files . upload ( file = 'my-batch-requests.jsonl' , config = types . UploadFileConfig ( display_name = 'my-batch-requests' , mime_type = 'jsonl' ) ) print ( f "Uploaded file: { uploaded_file . name } " ) JavaScript import { GoogleGenAI } from '@google/genai' ; import * as fs from "fs" ; import * as path from "path" ; import { fileURLToPath } from 'url' ; const GEMINI_API_KEY = process . env . GEMINI_API_KEY ; const ai = new GoogleGenAI ({ apiKey : GEMINI_API_KEY }); const fileName = "my-batch-requests.jsonl" ; // Define the requests const requests = [ { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }] }] } }, { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }] }] } } ]; // Construct the full path to file const __filename = fileURLToPath ( import . meta . url ); const __dirname = path . dirname ( __filename ); const filePath = path . join ( __dirname , fileName ); // __dirname is the directory of the current script async function writeBatchRequestsToFile ( requests , filePath ) { try { // Use a writable stream for efficiency, especially with larger files. const writeStream = fs . createWriteStream ( filePath , { flags : 'w' }); writeStream . on ( 'error' , ( err ) = > { console . error ( `Error writing to file ${ filePath } :` , err ); }); for ( const req of requests ) { writeStream . write ( JSON . stringify ( req ) + '\n' ); } writeStream . end (); console . log ( `Successfully wrote batch requests to ${ filePath } ` ); } catch ( error ) { // This catch block is for errors that might occur before stream setup, // stream errors are handled by the 'error' event. console . error ( `An unexpected error occurred:` , error ); } } // Write to a file. writeBatchRequestsToFile ( requests , filePath ); // Upload the file to the File API. const uploadedFile = await ai . files . upload ({ file : 'my-batch-requests.jsonl' , config : { mimeType : 'jsonl' , }}); console . log ( uploadedFile . name ); REST tmp_batch_input_file = batch_input.tmp echo -e '{"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}], "generationConfig": {"temperature": 0.7}}\n{"contents": [{"parts": [{"text": "What are the main ingredients in a Margherita pizza?"}]}]}' > batch_input.tmp MIME_TYPE = $( file -b --mime-type " ${ tmp_batch_input_file } " ) NUM_BYTES = $( wc -c < " ${ tmp_batch_input_file } " ) DISPLAY_NAME = BatchInput tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -D " ${ tmp_header_file } " \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/jsonl" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ tmp_batch_input_file } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) The following example calls the BatchGenerateContent method with the input file uploaded using File API: Python # Assumes `uploaded_file` is the file object from the previous step file_batch_job = client . batches . create ( model = "gemini-2.5-flash" , src = uploaded_file . name , config = { 'display_name' : "file-upload-job-1" , }, ) print ( f "Created batch job: { file_batch_job . name } " ) JavaScript // Assumes `uploadedFile` is the file object from the previous step const fileBatchJob = await ai . batches . create ({ model : 'gemini-2.5-flash' , src : uploadedFile . name , config : { displayName : 'file-upload-job-1' , } }); console . log ( fileBatchJob ); REST # Set the File ID taken from the upload response. BATCH_INPUT_FILE = 'files/123456' curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \ -X POST \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" \ -d "{ 'batch': { 'display_name': 'my-batch-requests', 'input_config': { 'file_name': ' ${ BATCH_INPUT_FILE } ' } } }" When you create a batch job, you will get a job name returned. Use this name for monitoring the job status as well as retrieving the results once the job completes. The following is an example output that contains a job name: Created batch job from file: batches/123456789 Batch embedding support You can use the Batch API to interact with the Embeddings model for higher throughput. To create an embeddings batch job with either inline requests or input files , use the batches.create_embeddings API and specify the embeddings model. Python # Creating an embeddings batch job with an input file request: file_job = client . batches . create_embeddings ( model = "gemini-embedding-001" , src = { 'file_name' : uploaded_batch_requests . name }, config = { 'display_name' : "Input embeddings batch" }, ) # Creating an embeddings batch job with an inline request: batch_job = client . batches . create_embeddings ( model = "gemini-embedding-001" , # For a predefined list of requests `inlined_requests` src = { 'inlined_requests' : inlined_requests }, config = { 'display_name' : "Inlined embeddings batch" }, ) JavaScript // Creating an embeddings batch job with an input file request: let fileJob ; fileJob = await client . batches . createEmbeddings ({ model : 'gemini-embedding-001' , src : { fileName : uploadedBatchRequests . name }, config : { displayName : 'Input embeddings batch' }, }); console . log ( `Created batch job: ${ fileJob . name } ` ); // Creating an embeddings batch job with an inline request: let batchJob ; batchJob = await client . batches . createEmbeddings ({ model : 'gemini-embedding-001' , // For a predefined a list of requests `inlinedRequests` src : { inlinedRequests : inlinedRequests }, config : { displayName : 'Inlined embeddings batch' }, }); console . log ( `Created batch job: ${ batchJob . name } ` ); Read the Embeddings section in the Batch API cookbook for more examples. Request configuration You can include any request configurations you would use in a standard non-batch request. For example, you could specify the temperature, system instructions or even pass in other modalities. The following example shows an example inline request that contains a system instruction for one of the requests: Python inline_requests_list = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Write a short poem about a cloud.' }]}]}, { 'contents' : [{ 'parts' : [{ 'text' : 'Write a short poem about a cat.' }]}], 'system_instructions' : { 'parts' : [{ 'text' : 'You are a cat. Your name is Neko.' }]}} ] JavaScript inlineRequestsList = [ { contents : [{ parts : [{ text : 'Write a short poem about a cloud.' }]}]}, { contents : [{ parts : [{ text : 'Write a short poem about a cat.' }]}], systemInstructions : { parts : [{ text : 'You are a cat. Your name is Neko.' }]}} ] Similarly can specify tools to use for a request. The following example shows a request that enables the Google Search tool : Python inline_requests_list = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Who won the euro 1998?' }]}]}, { 'contents' : [{ 'parts' : [{ 'text' : 'Who won the euro 2025?' }]}], 'tools' : [{ 'google_search ' : {}}]} ] JavaScript inlineRequestsList = [ { contents : [{ parts : [{ text : 'Who won the euro 1998?' }]}]}, { contents : [{ parts : [{ text : 'Who won the euro 2025?' }]}], tools : [{ googleSearch : {}}]} ] You can specify structured output as well. The following example shows how to specify for your batch requests. Python from google import genai from pydantic import BaseModel , TypeAdapter class Recipe ( BaseModel ): recipe_name : str ingredients : list [ str ] client = genai . Client () # A list of dictionaries, where each is a GenerateContentRequest inline_requests = [ { 'contents' : [{ 'parts' : [{ 'text' : 'List a few popular cookie recipes, and include the amounts of ingredients.' }], 'role' : 'user' }], 'config' : { 'response_mime_type' : 'application/json' , 'response_schema' : list [ Recipe ] } }, { 'contents' : [{ 'parts' : [{ 'text' : 'List a few popular gluten free cookie recipes, and include the amounts of ingredients.' }], 'role' : 'user' }], 'config' : { 'response_mime_type' : 'application/json' , 'response_schema' : list [ Recipe ] } } ] inline_batch_job = client . batches . create ( model = "models/gemini-2.5-flash" , src = inline_requests , config = { 'display_name' : "structured-output-job-1" }, ) # wait for the job to finish job_name = inline_batch_job . name print ( f "Polling status for job: { job_name } " ) while True : batch_job_inline = client . batches . get ( name = job_name ) if batch_job_inline . state . name in ( 'JOB_STATE_SUCCEEDED' , 'JOB_STATE_FAILED' , 'JOB_STATE_CANCELLED' , 'JOB_STATE_EXPIRED' ): break print ( f "Job not finished. Current state: { batch_job_inline . state . name } . Waiting 30 seconds..." ) time . sleep ( 30 ) print ( f "Job finished with state: { batch_job_inline . state . name } " ) # print the response for i , inline_response in enumerate ( batch_job_inline . dest . inlined_responses , start = 1 ): print ( f " \n --- Response { i } ---" ) # Check for a successful response if inline_response . response : # The .text property is a shortcut to the generated text. print ( inline_response . response . text ) JavaScript import { GoogleGenAI , Type } from '@google/genai' ; const GEMINI_API_KEY = process . env . GEMINI_API_KEY ; const ai = new GoogleGenAI ({ apiKey : GEMINI_API_KEY }); const inlinedRequests = [ { contents : [{ parts : [{ text : 'List a few popular cookie recipes, and include the amounts of ingredients.' }], role : 'user' }], config : { responseMimeType : 'application/json' , responseSchema : { type : Type . ARRAY , items : { type : Type . OBJECT , properties : { 'recipeName' : { type : Type . STRING , description : 'Name of the recipe' , nullable : false , }, 'ingredients' : { type : Type . ARRAY , items : { type : Type . STRING , description : 'Ingredients of the recipe' , nullable : false , }, }, }, required : [ 'recipeName' ], }, }, } }, { contents : [{ parts : [{ text : 'List a few popular gluten free cookie recipes, and include the amounts of ingredients.' }], role : 'user' }], config : { responseMimeType : 'application/json' , responseSchema : { type : Type . ARRAY , items : { type : Type . OBJECT , properties : { 'recipeName' : { type : Type . STRING , description : 'Name of the recipe' , nullable : false , }, 'ingredients' : { type : Type . ARRAY , items : { type : Type . STRING , description : 'Ingredients of the recipe' , nullable : false , }, }, }, required : [ 'recipeName' ], }, }, } } ] const inlinedBatchJob = await ai . batches . create ({ model : 'gemini-2.5-flash' , src : inlinedRequests , config : { displayName : 'inlined-requests-job-1' , } }); Monitoring job status Use the operation name obtained when creating the batch job to poll its status. The state field of the batch job will indicate its current status. A batch job can be in one of the following states: JOB_STATE_PENDING : The job has been created and is waiting to be processed by the service. JOB_STATE_RUNNING : The job is in progress. JOB_STATE_SUCCEEDED : The job completed successfully. You can now retrieve the results. JOB_STATE_FAILED : The job failed. Check the error details for more information. JOB_STATE_CANCELLED : The job was cancelled by the user. JOB_STATE_EXPIRED : The job has expired because it was running or pending for more than 48 hours. The job will not have any results to retrieve. You can try submitting the job again or splitting up the requests into smaller batches. You can poll the job status periodically to check for completion. Python # Use the name of the job you want to check # e.g., inline_batch_job.name from the previous step job_name = "YOUR_BATCH_JOB_NAME" # (e.g. 'batches/your-batch-id') batch_job = client . batches . get ( name = job_name ) completed_states = set ([ 'JOB_STATE_SUCCEEDED' , 'JOB_STATE_FAILED' , 'JOB_STATE_CANCELLED' , 'JOB_STATE_EXPIRED' , ]) print ( f "Polling status for job: { job_name } " ) batch_job = client . batches . get ( name = job_name ) # Initial get while batch_job . state . name not in completed_states : print ( f "Current state: { batch_job . state . name } " ) time . sleep ( 30 ) # Wait for 30 seconds before polling again batch_job = client . batches . get ( name = job_name ) print ( f "Job finished with state: { batch_job . state . name } " ) if batch_job . state . name == 'JOB_STATE_FAILED' : print ( f "Error: { batch_job . error } " ) JavaScript // Use the name of the job you want to check // e.g., inlinedBatchJob.name from the previous step let batchJob ; const completedStates = new Set ([ 'JOB_STATE_SUCCEEDED' , 'JOB_STATE_FAILED' , 'JOB_STATE_CANCELLED' , 'JOB_STATE_EXPIRED' , ]); try { batchJob = await ai . batches . get ({ name : inlinedBatchJob . name }); while ( ! completedStates . has ( batchJob . state )) { console . log ( `Current state: ${ batchJob . state } ` ); // Wait for 30 seconds before polling again await new Promise ( resolve = > setTimeout ( resolve , 30000 )); batchJob = await client . batches . get ({ name : batchJob . name }); } console . log ( `Job finished with state: ${ batchJob . state } ` ); if ( batchJob . state === 'JOB_STATE_FAILED' ) { // The exact structure of `error` might vary depending on the SDK // This assumes `error` is an object with a `message` property. console . error ( `Error: ${ batchJob . state } ` ); } } catch ( error ) { console . error ( `An error occurred while polling job ${ batchJob . name } :` , error ); } Retrieving results Once the job status indicates your batch job has succeeded, the results are available in the response field. Python import json # Use the name of the job you want to check # e.g., inline_batch_job.name from the previous step job_name = "YOUR_BATCH_JOB_NAME" batch_job = client . batches . get ( name = job_name ) if batch_job . state . name == 'JOB_STATE_SUCCEEDED' : # If batch job was created with a file if batch_job . dest and batch_job . dest . file_name : # Results are in a file result_file_name = batch_job . dest . file_name print ( f "Results are in file: { result_file_name } " ) print ( "Downloading result file content..." ) file_content = client . files . download ( file = result_file_name ) # Process file_content (bytes) as needed print ( file_content . decode ( 'utf-8' )) # If batch job was created with inline request # (for embeddings, use batch_job.dest.inlined_embed_content_responses) elif batch_job . dest and batch_job . dest . inlined_responses : # Results are inline print ( "Results are inline:" ) for i , inline_response in enumerate ( batch_job . dest . inlined_responses ): print ( f "Response { i + 1 } :" ) if inline_response . response : # Accessing response, structure may vary. try : print ( inline_response . response . text ) except AttributeError : print ( inline_response . response ) # Fallback elif inline_response . error : print ( f "Error: { inline_response . error } " ) else : print ( "No results found (neither file nor inline)." ) else : print ( f "Job did not succeed. Final state: { batch_job . state . name } " ) if batch_job . error : print ( f "Error: { batch_job . error } " ) JavaScript // Use the name of the job you want to check // e.g., inlinedBatchJob.name from the previous step const jobName = "YOUR_BATCH_JOB_NAME" ; try { const batchJob = await ai . batches . get ({ name : jobName }); if ( batchJob . state === 'JOB_STATE_SUCCEEDED' ) { console . log ( 'Found completed batch:' , batchJob . displayName ); console . log ( batchJob ); // If batch job was created with a file destination if ( batchJob . dest ? . fileName ) { const resultFileName = batchJob . dest . fileName ; console . log ( `Results are in file: ${ resultFileName } ` ); console . log ( "Downloading result file content..." ); const fileContentBuffer = await ai . files . download ({ file : resultFileName }); // Process fileContentBuffer (Buffer) as needed console . log ( fileContentBuffer . toString ( 'utf-8' )); } // If batch job was created with inline responses else if ( batchJob . dest ? . inlinedResponses ) { console . log ( "Results are inline:" ); for ( let i = 0 ; i < batchJob . dest . inlinedResponses . length ; i ++ ) { const inlineResponse = batchJob . dest . inlinedResponses [ i ]; console . log ( `Response ${ i + 1 } :` ); if ( inlineResponse . response ) { // Accessing response, structure may vary. if ( inlineResponse . response . text !== undefined ) { console . log ( inlineResponse . response . text ); } else { console . log ( inlineResponse . response ); // Fallback } } else if ( inlineResponse . error ) { console . error ( `Error: ${ inlineResponse . error } ` ); } } } // If batch job was an embedding batch with inline responses else if ( batchJob . dest ? . inlinedEmbedContentResponses ) { console . log ( "Embedding results found inline:" ); for ( let i = 0 ; i < batchJob . dest . inlinedEmbedContentResponses . length ; i ++ ) { const inlineResponse = batchJob . dest . inlinedEmbedContentResponses [ i ]; console . log ( `Response ${ i + 1 } :` ); if ( inlineResponse . response ) { console . log ( inlineResponse . response ); } else if ( inlineResponse . error ) { console . error ( `Error: ${ inlineResponse . error } ` ); } } } else { console . log ( "No results found (neither file nor inline)." ); } } else { console . log ( `Job did not succeed. Final state: ${ batchJob . state } ` ); if ( batchJob . error ) { console . error ( `Error: ${ typeof batchJob . error === 'string' ? batchJob . error : batchJob . error . message || JSON . stringify ( batchJob . error ) } ` ); } } } catch ( error ) { console . error ( `An error occurred while processing job ${ jobName } :` , error ); } REST BATCH_NAME = "batches/123456" # Your batch job name curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" 2 > /dev/null > batch_status.json if jq -r '.done' batch_status.json | grep -q "false" ; then echo "Batch has not finished processing" fi batch_state = $( jq -r '.metadata.state' batch_status.json ) if [[ $batch_state = "JOB_STATE_SUCCEEDED" ]] ; then if [[ $( jq '.response | has("inlinedResponses")' batch_status.json ) = "true" ]] ; then jq -r '.response.inlinedResponses' batch_status.json exit fi responses_file_name = $( jq -r '.response.responsesFile' batch_status.json ) curl https://generativelanguage.googleapis.com/download/v1beta/ $responses_file_name :download?alt = media \ -H "x-goog-api-key: $GEMINI_API_KEY " 2 > /dev/null elif [[ $batch_state = "JOB_STATE_FAILED" ]] ; then jq '.error' batch_status.json elif [[ $batch_state == "JOB_STATE_CANCELLED" ]] ; then echo "Batch was cancelled by the user" elif [[ $batch_state == "JOB_STATE_EXPIRED" ]] ; then echo "Batch expired after 48 hours" fi Cancelling a batch job You can cancel an ongoing batch job using its name. When a job is canceled, it stops processing new requests. Python # Cancel a batch job client . batches . cancel ( name = batch_job_to_cancel . name ) JavaScript await ai . batches . cancel ({ name : batchJobToCancel . name }); REST BATCH_NAME = "batches/123456" # Your batch job name # Cancel the batch curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME :cancel \ -H "x-goog-api-key: $GEMINI_API_KEY " \ # Confirm that the status of the batch after cancellation is JOB_STATE_CANCELLED curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" 2 > /dev/null | jq -r '.metadata.state' Deleting a batch job You can delete an existing batch job using its name. When a job is deleted, it stops processing new requests and is removed from the list of batch jobs. Python # Delete a batch job client . batches . delete ( name = batch_job_to_delete . name ) JavaScript await ai . batches . delete ({ name : batchJobToDelete . name }); REST BATCH_NAME = "batches/123456" # Your batch job name # Delete the batch job curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME :delete \ -H "x-goog-api-key: $GEMINI_API_KEY " Technical details Supported models: Batch API supports a range of Gemini models. Refer to the Models page for each model's support of Batch API. The supported modalities for Batch API are the same as what's supported on the interactive (or non-batch) API. Pricing: Batch API usage is priced at 50% of the standard interactive API cost for the equivalent model. See the pricing page for details. Refer to the rate limits page for details on rate limits for this feature. Service Level Objective (SLO): Batch jobs are designed to complete within a 24-hour turnaround time. Many jobs may complete much faster depending on their size and current system load. Caching: Context caching is enabled for batch requests. If a request in your batch results in a cache hit, the cached tokens are priced the same as for non-batch API traffic. Best practices Use input files for large requests: For a large number of requests, always use the file input method for better manageability and to avoid hitting request size limits for the BatchGenerateContent call itself. Note that there's a the 2GB file size limit per input file. Error handling: Check the batchStats for failedRequestCount after a job completes. If using file output, parse each line to check if it's a GenerateContentResponse or a status object indicating an error for that specific request. See the troubleshooting guide for a complete set of error codes. Submit jobs once: The creation of a batch job is not idempotent. If you send the same creation request twice, two separate batch jobs will be created. Break up very large batches: While the target turnaround time is 24 hours, actual processing time can vary based on system load and job size. For large jobs, consider breaking them into smaller batches if intermediate results are needed sooner. What's next Check out the Batch API notebook for more examples. The OpenAI compatibility layer supports Batch API. Read the examples on the OpenAI Compatibility page. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-09-12 UTC.
text_content/docs_batch-api_8dc15f59.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ URL: https://ai.google.dev/gemini-api/docs/batch-api#input-file
2
+ Title: Batch API  |  Gemini API  |  Google AI for Developers
3
+ ==================================================
4
+
5
+ Batch API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Batch API The Gemini Batch API is designed to process large volumes of requests asynchronously at 50% of the standard cost . The target turnaround time is 24 hours, but in majority of cases, it is much quicker. Use Batch API for large-scale, non-urgent tasks such as data pre-processing or running evaluations where an immediate response is not required. Creating a batch job You have two ways to submit your requests in Batch API: Inline Requests : A list of GenerateContentRequest objects directly included in your batch creation request. This is suitable for smaller batches that keep the total request size under 20MB. The output returned from the model is a list of inlineResponse objects. Input File : A JSON Lines (JSONL) file where each line contains a complete GenerateContentRequest object. This method is recommended for larger requests. The output returned from the model is a JSONL file where each line is either a GenerateContentResponse or a status object. Inline requests For a small number of requests, you can directly embed the GenerateContentRequest objects within your BatchGenerateContentRequest . The following example calls the BatchGenerateContent method with inline requests: Python from google import genai from google.genai import types client = genai . Client () # A list of dictionaries, where each is a GenerateContentRequest inline_requests = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Tell me a one-sentence joke.' }], 'role' : 'user' }] }, { 'contents' : [{ 'parts' : [{ 'text' : 'Why is the sky blue?' }], 'role' : 'user' }] } ] inline_batch_job = client . batches . create ( model = "models/gemini-2.5-flash" , src = inline_requests , config = { 'display_name' : "inlined-requests-job-1" , }, ) print ( f "Created batch job: { inline_batch_job . name } " ) JavaScript import { GoogleGenAI } from '@google/genai' ; const GEMINI_API_KEY = process . env . GEMINI_API_KEY ; const ai = new GoogleGenAI ({ apiKey : GEMINI_API_KEY }); const inlinedRequests = [ { contents : [{ parts : [{ text : 'Tell me a one-sentence joke.' }], role : 'user' }] }, { contents : [{ parts : [{ 'text' : 'Why is the sky blue?' }], role : 'user' }] } ] const response = await ai . batches . create ({ model : 'gemini-2.5-flash' , src : inlinedRequests , config : { displayName : 'inlined-requests-job-1' , } }); console . log ( response ); REST curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -X POST \ -H "Content-Type:application/json" \ -d '{ "batch": { "display_name": "my-batch-requests", "input_config": { "requests": { "requests": [ { "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]}, "metadata": { "key": "request-1" } }, { "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]}, "metadata": { "key": "request-2" } } ] } } } }' Input file For larger sets of requests, prepare a JSON Lines (JSONL) file. Each line in this file must be a JSON object containing a user-defined key and a request object, where the request is a valid GenerateContentRequest object. The user-defined key is used in the response to indicate which output is the result of which request. For example, the request with the key defined as request-1 will have its response annotated with the same key name. This file is uploaded using the File API . The maximum allowed file size for an input file is 2GB. The following is an example of a JSONL file. You can save it in a file named my-batch-requests.json : { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }]}], "generation_config" : { "temperature" : 0.7 }}} { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }]}]}} Similarly to inline requests, you can specify other parameters like system instructions, tools or other configurations in each request JSON. You can upload this file using the File API as shown in the following example. If you are working with multimodal input, you can reference other uploaded files within your JSONL file. Python from google import genai from google.genai import types client = genai . Client () # Create a sample JSONL file with open ( "my-batch-requests.jsonl" , "w" ) as f : requests = [ { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }]}]}}, { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }]}]}} ] for req in requests : f . write ( json . dumps ( req ) + " \n " ) # Upload the file to the File API uploaded_file = client . files . upload ( file = 'my-batch-requests.jsonl' , config = types . UploadFileConfig ( display_name = 'my-batch-requests' , mime_type = 'jsonl' ) ) print ( f "Uploaded file: { uploaded_file . name } " ) JavaScript import { GoogleGenAI } from '@google/genai' ; import * as fs from "fs" ; import * as path from "path" ; import { fileURLToPath } from 'url' ; const GEMINI_API_KEY = process . env . GEMINI_API_KEY ; const ai = new GoogleGenAI ({ apiKey : GEMINI_API_KEY }); const fileName = "my-batch-requests.jsonl" ; // Define the requests const requests = [ { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }] }] } }, { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }] }] } } ]; // Construct the full path to file const __filename = fileURLToPath ( import . meta . url ); const __dirname = path . dirname ( __filename ); const filePath = path . join ( __dirname , fileName ); // __dirname is the directory of the current script async function writeBatchRequestsToFile ( requests , filePath ) { try { // Use a writable stream for efficiency, especially with larger files. const writeStream = fs . createWriteStream ( filePath , { flags : 'w' }); writeStream . on ( 'error' , ( err ) = > { console . error ( `Error writing to file ${ filePath } :` , err ); }); for ( const req of requests ) { writeStream . write ( JSON . stringify ( req ) + '\n' ); } writeStream . end (); console . log ( `Successfully wrote batch requests to ${ filePath } ` ); } catch ( error ) { // This catch block is for errors that might occur before stream setup, // stream errors are handled by the 'error' event. console . error ( `An unexpected error occurred:` , error ); } } // Write to a file. writeBatchRequestsToFile ( requests , filePath ); // Upload the file to the File API. const uploadedFile = await ai . files . upload ({ file : 'my-batch-requests.jsonl' , config : { mimeType : 'jsonl' , }}); console . log ( uploadedFile . name ); REST tmp_batch_input_file = batch_input.tmp echo -e '{"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}], "generationConfig": {"temperature": 0.7}}\n{"contents": [{"parts": [{"text": "What are the main ingredients in a Margherita pizza?"}]}]}' > batch_input.tmp MIME_TYPE = $( file -b --mime-type " ${ tmp_batch_input_file } " ) NUM_BYTES = $( wc -c < " ${ tmp_batch_input_file } " ) DISPLAY_NAME = BatchInput tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -D " ${ tmp_header_file } " \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/jsonl" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ tmp_batch_input_file } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) The following example calls the BatchGenerateContent method with the input file uploaded using File API: Python # Assumes `uploaded_file` is the file object from the previous step file_batch_job = client . batches . create ( model = "gemini-2.5-flash" , src = uploaded_file . name , config = { 'display_name' : "file-upload-job-1" , }, ) print ( f "Created batch job: { file_batch_job . name } " ) JavaScript // Assumes `uploadedFile` is the file object from the previous step const fileBatchJob = await ai . batches . create ({ model : 'gemini-2.5-flash' , src : uploadedFile . name , config : { displayName : 'file-upload-job-1' , } }); console . log ( fileBatchJob ); REST # Set the File ID taken from the upload response. BATCH_INPUT_FILE = 'files/123456' curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \ -X POST \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" \ -d "{ 'batch': { 'display_name': 'my-batch-requests', 'input_config': { 'file_name': ' ${ BATCH_INPUT_FILE } ' } } }" When you create a batch job, you will get a job name returned. Use this name for monitoring the job status as well as retrieving the results once the job completes. The following is an example output that contains a job name: Created batch job from file: batches/123456789 Batch embedding support You can use the Batch API to interact with the Embeddings model for higher throughput. To create an embeddings batch job with either inline requests or input files , use the batches.create_embeddings API and specify the embeddings model. Python # Creating an embeddings batch job with an input file request: file_job = client . batches . create_embeddings ( model = "gemini-embedding-001" , src = { 'file_name' : uploaded_batch_requests . name }, config = { 'display_name' : "Input embeddings batch" }, ) # Creating an embeddings batch job with an inline request: batch_job = client . batches . create_embeddings ( model = "gemini-embedding-001" , # For a predefined list of requests `inlined_requests` src = { 'inlined_requests' : inlined_requests }, config = { 'display_name' : "Inlined embeddings batch" }, ) JavaScript // Creating an embeddings batch job with an input file request: let fileJob ; fileJob = await client . batches . createEmbeddings ({ model : 'gemini-embedding-001' , src : { fileName : uploadedBatchRequests . name }, config : { displayName : 'Input embeddings batch' }, }); console . log ( `Created batch job: ${ fileJob . name } ` ); // Creating an embeddings batch job with an inline request: let batchJob ; batchJob = await client . batches . createEmbeddings ({ model : 'gemini-embedding-001' , // For a predefined a list of requests `inlinedRequests` src : { inlinedRequests : inlinedRequests }, config : { displayName : 'Inlined embeddings batch' }, }); console . log ( `Created batch job: ${ batchJob . name } ` ); Read the Embeddings section in the Batch API cookbook for more examples. Request configuration You can include any request configurations you would use in a standard non-batch request. For example, you could specify the temperature, system instructions or even pass in other modalities. The following example shows an example inline request that contains a system instruction for one of the requests: Python inline_requests_list = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Write a short poem about a cloud.' }]}]}, { 'contents' : [{ 'parts' : [{ 'text' : 'Write a short poem about a cat.' }]}], 'system_instructions' : { 'parts' : [{ 'text' : 'You are a cat. Your name is Neko.' }]}} ] JavaScript inlineRequestsList = [ { contents : [{ parts : [{ text : 'Write a short poem about a cloud.' }]}]}, { contents : [{ parts : [{ text : 'Write a short poem about a cat.' }]}], systemInstructions : { parts : [{ text : 'You are a cat. Your name is Neko.' }]}} ] Similarly can specify tools to use for a request. The following example shows a request that enables the Google Search tool : Python inline_requests_list = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Who won the euro 1998?' }]}]}, { 'contents' : [{ 'parts' : [{ 'text' : 'Who won the euro 2025?' }]}], 'tools' : [{ 'google_search ' : {}}]} ] JavaScript inlineRequestsList = [ { contents : [{ parts : [{ text : 'Who won the euro 1998?' }]}]}, { contents : [{ parts : [{ text : 'Who won the euro 2025?' }]}], tools : [{ googleSearch : {}}]} ] You can specify structured output as well. The following example shows how to specify for your batch requests. Python from google import genai from pydantic import BaseModel , TypeAdapter class Recipe ( BaseModel ): recipe_name : str ingredients : list [ str ] client = genai . Client () # A list of dictionaries, where each is a GenerateContentRequest inline_requests = [ { 'contents' : [{ 'parts' : [{ 'text' : 'List a few popular cookie recipes, and include the amounts of ingredients.' }], 'role' : 'user' }], 'config' : { 'response_mime_type' : 'application/json' , 'response_schema' : list [ Recipe ] } }, { 'contents' : [{ 'parts' : [{ 'text' : 'List a few popular gluten free cookie recipes, and include the amounts of ingredients.' }], 'role' : 'user' }], 'config' : { 'response_mime_type' : 'application/json' , 'response_schema' : list [ Recipe ] } } ] inline_batch_job = client . batches . create ( model = "models/gemini-2.5-flash" , src = inline_requests , config = { 'display_name' : "structured-output-job-1" }, ) # wait for the job to finish job_name = inline_batch_job . name print ( f "Polling status for job: { job_name } " ) while True : batch_job_inline = client . batches . get ( name = job_name ) if batch_job_inline . state . name in ( 'JOB_STATE_SUCCEEDED' , 'JOB_STATE_FAILED' , 'JOB_STATE_CANCELLED' , 'JOB_STATE_EXPIRED' ): break print ( f "Job not finished. Current state: { batch_job_inline . state . name } . Waiting 30 seconds..." ) time . sleep ( 30 ) print ( f "Job finished with state: { batch_job_inline . state . name } " ) # print the response for i , inline_response in enumerate ( batch_job_inline . dest . inlined_responses , start = 1 ): print ( f " \n --- Response { i } ---" ) # Check for a successful response if inline_response . response : # The .text property is a shortcut to the generated text. print ( inline_response . response . text ) JavaScript import { GoogleGenAI , Type } from '@google/genai' ; const GEMINI_API_KEY = process . env . GEMINI_API_KEY ; const ai = new GoogleGenAI ({ apiKey : GEMINI_API_KEY }); const inlinedRequests = [ { contents : [{ parts : [{ text : 'List a few popular cookie recipes, and include the amounts of ingredients.' }], role : 'user' }], config : { responseMimeType : 'application/json' , responseSchema : { type : Type . ARRAY , items : { type : Type . OBJECT , properties : { 'recipeName' : { type : Type . STRING , description : 'Name of the recipe' , nullable : false , }, 'ingredients' : { type : Type . ARRAY , items : { type : Type . STRING , description : 'Ingredients of the recipe' , nullable : false , }, }, }, required : [ 'recipeName' ], }, }, } }, { contents : [{ parts : [{ text : 'List a few popular gluten free cookie recipes, and include the amounts of ingredients.' }], role : 'user' }], config : { responseMimeType : 'application/json' , responseSchema : { type : Type . ARRAY , items : { type : Type . OBJECT , properties : { 'recipeName' : { type : Type . STRING , description : 'Name of the recipe' , nullable : false , }, 'ingredients' : { type : Type . ARRAY , items : { type : Type . STRING , description : 'Ingredients of the recipe' , nullable : false , }, }, }, required : [ 'recipeName' ], }, }, } } ] const inlinedBatchJob = await ai . batches . create ({ model : 'gemini-2.5-flash' , src : inlinedRequests , config : { displayName : 'inlined-requests-job-1' , } }); Monitoring job status Use the operation name obtained when creating the batch job to poll its status. The state field of the batch job will indicate its current status. A batch job can be in one of the following states: JOB_STATE_PENDING : The job has been created and is waiting to be processed by the service. JOB_STATE_RUNNING : The job is in progress. JOB_STATE_SUCCEEDED : The job completed successfully. You can now retrieve the results. JOB_STATE_FAILED : The job failed. Check the error details for more information. JOB_STATE_CANCELLED : The job was cancelled by the user. JOB_STATE_EXPIRED : The job has expired because it was running or pending for more than 48 hours. The job will not have any results to retrieve. You can try submitting the job again or splitting up the requests into smaller batches. You can poll the job status periodically to check for completion. Python # Use the name of the job you want to check # e.g., inline_batch_job.name from the previous step job_name = "YOUR_BATCH_JOB_NAME" # (e.g. 'batches/your-batch-id') batch_job = client . batches . get ( name = job_name ) completed_states = set ([ 'JOB_STATE_SUCCEEDED' , 'JOB_STATE_FAILED' , 'JOB_STATE_CANCELLED' , 'JOB_STATE_EXPIRED' , ]) print ( f "Polling status for job: { job_name } " ) batch_job = client . batches . get ( name = job_name ) # Initial get while batch_job . state . name not in completed_states : print ( f "Current state: { batch_job . state . name } " ) time . sleep ( 30 ) # Wait for 30 seconds before polling again batch_job = client . batches . get ( name = job_name ) print ( f "Job finished with state: { batch_job . state . name } " ) if batch_job . state . name == 'JOB_STATE_FAILED' : print ( f "Error: { batch_job . error } " ) JavaScript // Use the name of the job you want to check // e.g., inlinedBatchJob.name from the previous step let batchJob ; const completedStates = new Set ([ 'JOB_STATE_SUCCEEDED' , 'JOB_STATE_FAILED' , 'JOB_STATE_CANCELLED' , 'JOB_STATE_EXPIRED' , ]); try { batchJob = await ai . batches . get ({ name : inlinedBatchJob . name }); while ( ! completedStates . has ( batchJob . state )) { console . log ( `Current state: ${ batchJob . state } ` ); // Wait for 30 seconds before polling again await new Promise ( resolve = > setTimeout ( resolve , 30000 )); batchJob = await client . batches . get ({ name : batchJob . name }); } console . log ( `Job finished with state: ${ batchJob . state } ` ); if ( batchJob . state === 'JOB_STATE_FAILED' ) { // The exact structure of `error` might vary depending on the SDK // This assumes `error` is an object with a `message` property. console . error ( `Error: ${ batchJob . state } ` ); } } catch ( error ) { console . error ( `An error occurred while polling job ${ batchJob . name } :` , error ); } Retrieving results Once the job status indicates your batch job has succeeded, the results are available in the response field. Python import json # Use the name of the job you want to check # e.g., inline_batch_job.name from the previous step job_name = "YOUR_BATCH_JOB_NAME" batch_job = client . batches . get ( name = job_name ) if batch_job . state . name == 'JOB_STATE_SUCCEEDED' : # If batch job was created with a file if batch_job . dest and batch_job . dest . file_name : # Results are in a file result_file_name = batch_job . dest . file_name print ( f "Results are in file: { result_file_name } " ) print ( "Downloading result file content..." ) file_content = client . files . download ( file = result_file_name ) # Process file_content (bytes) as needed print ( file_content . decode ( 'utf-8' )) # If batch job was created with inline request # (for embeddings, use batch_job.dest.inlined_embed_content_responses) elif batch_job . dest and batch_job . dest . inlined_responses : # Results are inline print ( "Results are inline:" ) for i , inline_response in enumerate ( batch_job . dest . inlined_responses ): print ( f "Response { i + 1 } :" ) if inline_response . response : # Accessing response, structure may vary. try : print ( inline_response . response . text ) except AttributeError : print ( inline_response . response ) # Fallback elif inline_response . error : print ( f "Error: { inline_response . error } " ) else : print ( "No results found (neither file nor inline)." ) else : print ( f "Job did not succeed. Final state: { batch_job . state . name } " ) if batch_job . error : print ( f "Error: { batch_job . error } " ) JavaScript // Use the name of the job you want to check // e.g., inlinedBatchJob.name from the previous step const jobName = "YOUR_BATCH_JOB_NAME" ; try { const batchJob = await ai . batches . get ({ name : jobName }); if ( batchJob . state === 'JOB_STATE_SUCCEEDED' ) { console . log ( 'Found completed batch:' , batchJob . displayName ); console . log ( batchJob ); // If batch job was created with a file destination if ( batchJob . dest ? . fileName ) { const resultFileName = batchJob . dest . fileName ; console . log ( `Results are in file: ${ resultFileName } ` ); console . log ( "Downloading result file content..." ); const fileContentBuffer = await ai . files . download ({ file : resultFileName }); // Process fileContentBuffer (Buffer) as needed console . log ( fileContentBuffer . toString ( 'utf-8' )); } // If batch job was created with inline responses else if ( batchJob . dest ? . inlinedResponses ) { console . log ( "Results are inline:" ); for ( let i = 0 ; i < batchJob . dest . inlinedResponses . length ; i ++ ) { const inlineResponse = batchJob . dest . inlinedResponses [ i ]; console . log ( `Response ${ i + 1 } :` ); if ( inlineResponse . response ) { // Accessing response, structure may vary. if ( inlineResponse . response . text !== undefined ) { console . log ( inlineResponse . response . text ); } else { console . log ( inlineResponse . response ); // Fallback } } else if ( inlineResponse . error ) { console . error ( `Error: ${ inlineResponse . error } ` ); } } } // If batch job was an embedding batch with inline responses else if ( batchJob . dest ? . inlinedEmbedContentResponses ) { console . log ( "Embedding results found inline:" ); for ( let i = 0 ; i < batchJob . dest . inlinedEmbedContentResponses . length ; i ++ ) { const inlineResponse = batchJob . dest . inlinedEmbedContentResponses [ i ]; console . log ( `Response ${ i + 1 } :` ); if ( inlineResponse . response ) { console . log ( inlineResponse . response ); } else if ( inlineResponse . error ) { console . error ( `Error: ${ inlineResponse . error } ` ); } } } else { console . log ( "No results found (neither file nor inline)." ); } } else { console . log ( `Job did not succeed. Final state: ${ batchJob . state } ` ); if ( batchJob . error ) { console . error ( `Error: ${ typeof batchJob . error === 'string' ? batchJob . error : batchJob . error . message || JSON . stringify ( batchJob . error ) } ` ); } } } catch ( error ) { console . error ( `An error occurred while processing job ${ jobName } :` , error ); } REST BATCH_NAME = "batches/123456" # Your batch job name curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" 2 > /dev/null > batch_status.json if jq -r '.done' batch_status.json | grep -q "false" ; then echo "Batch has not finished processing" fi batch_state = $( jq -r '.metadata.state' batch_status.json ) if [[ $batch_state = "JOB_STATE_SUCCEEDED" ]] ; then if [[ $( jq '.response | has("inlinedResponses")' batch_status.json ) = "true" ]] ; then jq -r '.response.inlinedResponses' batch_status.json exit fi responses_file_name = $( jq -r '.response.responsesFile' batch_status.json ) curl https://generativelanguage.googleapis.com/download/v1beta/ $responses_file_name :download?alt = media \ -H "x-goog-api-key: $GEMINI_API_KEY " 2 > /dev/null elif [[ $batch_state = "JOB_STATE_FAILED" ]] ; then jq '.error' batch_status.json elif [[ $batch_state == "JOB_STATE_CANCELLED" ]] ; then echo "Batch was cancelled by the user" elif [[ $batch_state == "JOB_STATE_EXPIRED" ]] ; then echo "Batch expired after 48 hours" fi Cancelling a batch job You can cancel an ongoing batch job using its name. When a job is canceled, it stops processing new requests. Python # Cancel a batch job client . batches . cancel ( name = batch_job_to_cancel . name ) JavaScript await ai . batches . cancel ({ name : batchJobToCancel . name }); REST BATCH_NAME = "batches/123456" # Your batch job name # Cancel the batch curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME :cancel \ -H "x-goog-api-key: $GEMINI_API_KEY " \ # Confirm that the status of the batch after cancellation is JOB_STATE_CANCELLED curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" 2 > /dev/null | jq -r '.metadata.state' Deleting a batch job You can delete an existing batch job using its name. When a job is deleted, it stops processing new requests and is removed from the list of batch jobs. Python # Delete a batch job client . batches . delete ( name = batch_job_to_delete . name ) JavaScript await ai . batches . delete ({ name : batchJobToDelete . name }); REST BATCH_NAME = "batches/123456" # Your batch job name # Delete the batch job curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME :delete \ -H "x-goog-api-key: $GEMINI_API_KEY " Technical details Supported models: Batch API supports a range of Gemini models. Refer to the Models page for each model's support of Batch API. The supported modalities for Batch API are the same as what's supported on the interactive (or non-batch) API. Pricing: Batch API usage is priced at 50% of the standard interactive API cost for the equivalent model. See the pricing page for details. Refer to the rate limits page for details on rate limits for this feature. Service Level Objective (SLO): Batch jobs are designed to complete within a 24-hour turnaround time. Many jobs may complete much faster depending on their size and current system load. Caching: Context caching is enabled for batch requests. If a request in your batch results in a cache hit, the cached tokens are priced the same as for non-batch API traffic. Best practices Use input files for large requests: For a large number of requests, always use the file input method for better manageability and to avoid hitting request size limits for the BatchGenerateContent call itself. Note that there's a the 2GB file size limit per input file. Error handling: Check the batchStats for failedRequestCount after a job completes. If using file output, parse each line to check if it's a GenerateContentResponse or a status object indicating an error for that specific request. See the troubleshooting guide for a complete set of error codes. Submit jobs once: The creation of a batch job is not idempotent. If you send the same creation request twice, two separate batch jobs will be created. Break up very large batches: While the target turnaround time is 24 hours, actual processing time can vary based on system load and job size. For large jobs, consider breaking them into smaller batches if intermediate results are needed sooner. What's next Check out the Batch API notebook for more examples. The OpenAI compatibility layer supports Batch API. Read the examples on the OpenAI Compatibility page. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-09-12 UTC.
text_content/docs_batch-api_94dee6bb.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ URL: https://ai.google.dev/gemini-api/docs/batch-api#batch-embedding
2
+ Title: Batch API  |  Gemini API  |  Google AI for Developers
3
+ ==================================================
4
+
5
+ Batch API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Batch API The Gemini Batch API is designed to process large volumes of requests asynchronously at 50% of the standard cost . The target turnaround time is 24 hours, but in majority of cases, it is much quicker. Use Batch API for large-scale, non-urgent tasks such as data pre-processing or running evaluations where an immediate response is not required. Creating a batch job You have two ways to submit your requests in Batch API: Inline Requests : A list of GenerateContentRequest objects directly included in your batch creation request. This is suitable for smaller batches that keep the total request size under 20MB. The output returned from the model is a list of inlineResponse objects. Input File : A JSON Lines (JSONL) file where each line contains a complete GenerateContentRequest object. This method is recommended for larger requests. The output returned from the model is a JSONL file where each line is either a GenerateContentResponse or a status object. Inline requests For a small number of requests, you can directly embed the GenerateContentRequest objects within your BatchGenerateContentRequest . The following example calls the BatchGenerateContent method with inline requests: Python from google import genai from google.genai import types client = genai . Client () # A list of dictionaries, where each is a GenerateContentRequest inline_requests = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Tell me a one-sentence joke.' }], 'role' : 'user' }] }, { 'contents' : [{ 'parts' : [{ 'text' : 'Why is the sky blue?' }], 'role' : 'user' }] } ] inline_batch_job = client . batches . create ( model = "models/gemini-2.5-flash" , src = inline_requests , config = { 'display_name' : "inlined-requests-job-1" , }, ) print ( f "Created batch job: { inline_batch_job . name } " ) JavaScript import { GoogleGenAI } from '@google/genai' ; const GEMINI_API_KEY = process . env . GEMINI_API_KEY ; const ai = new GoogleGenAI ({ apiKey : GEMINI_API_KEY }); const inlinedRequests = [ { contents : [{ parts : [{ text : 'Tell me a one-sentence joke.' }], role : 'user' }] }, { contents : [{ parts : [{ 'text' : 'Why is the sky blue?' }], role : 'user' }] } ] const response = await ai . batches . create ({ model : 'gemini-2.5-flash' , src : inlinedRequests , config : { displayName : 'inlined-requests-job-1' , } }); console . log ( response ); REST curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -X POST \ -H "Content-Type:application/json" \ -d '{ "batch": { "display_name": "my-batch-requests", "input_config": { "requests": { "requests": [ { "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]}, "metadata": { "key": "request-1" } }, { "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]}, "metadata": { "key": "request-2" } } ] } } } }' Input file For larger sets of requests, prepare a JSON Lines (JSONL) file. Each line in this file must be a JSON object containing a user-defined key and a request object, where the request is a valid GenerateContentRequest object. The user-defined key is used in the response to indicate which output is the result of which request. For example, the request with the key defined as request-1 will have its response annotated with the same key name. This file is uploaded using the File API . The maximum allowed file size for an input file is 2GB. The following is an example of a JSONL file. You can save it in a file named my-batch-requests.json : { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }]}], "generation_config" : { "temperature" : 0.7 }}} { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }]}]}} Similarly to inline requests, you can specify other parameters like system instructions, tools or other configurations in each request JSON. You can upload this file using the File API as shown in the following example. If you are working with multimodal input, you can reference other uploaded files within your JSONL file. Python from google import genai from google.genai import types client = genai . Client () # Create a sample JSONL file with open ( "my-batch-requests.jsonl" , "w" ) as f : requests = [ { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }]}]}}, { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }]}]}} ] for req in requests : f . write ( json . dumps ( req ) + " \n " ) # Upload the file to the File API uploaded_file = client . files . upload ( file = 'my-batch-requests.jsonl' , config = types . UploadFileConfig ( display_name = 'my-batch-requests' , mime_type = 'jsonl' ) ) print ( f "Uploaded file: { uploaded_file . name } " ) JavaScript import { GoogleGenAI } from '@google/genai' ; import * as fs from "fs" ; import * as path from "path" ; import { fileURLToPath } from 'url' ; const GEMINI_API_KEY = process . env . GEMINI_API_KEY ; const ai = new GoogleGenAI ({ apiKey : GEMINI_API_KEY }); const fileName = "my-batch-requests.jsonl" ; // Define the requests const requests = [ { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }] }] } }, { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }] }] } } ]; // Construct the full path to file const __filename = fileURLToPath ( import . meta . url ); const __dirname = path . dirname ( __filename ); const filePath = path . join ( __dirname , fileName ); // __dirname is the directory of the current script async function writeBatchRequestsToFile ( requests , filePath ) { try { // Use a writable stream for efficiency, especially with larger files. const writeStream = fs . createWriteStream ( filePath , { flags : 'w' }); writeStream . on ( 'error' , ( err ) = > { console . error ( `Error writing to file ${ filePath } :` , err ); }); for ( const req of requests ) { writeStream . write ( JSON . stringify ( req ) + '\n' ); } writeStream . end (); console . log ( `Successfully wrote batch requests to ${ filePath } ` ); } catch ( error ) { // This catch block is for errors that might occur before stream setup, // stream errors are handled by the 'error' event. console . error ( `An unexpected error occurred:` , error ); } } // Write to a file. writeBatchRequestsToFile ( requests , filePath ); // Upload the file to the File API. const uploadedFile = await ai . files . upload ({ file : 'my-batch-requests.jsonl' , config : { mimeType : 'jsonl' , }}); console . log ( uploadedFile . name ); REST tmp_batch_input_file = batch_input.tmp echo -e '{"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}], "generationConfig": {"temperature": 0.7}}\n{"contents": [{"parts": [{"text": "What are the main ingredients in a Margherita pizza?"}]}]}' > batch_input.tmp MIME_TYPE = $( file -b --mime-type " ${ tmp_batch_input_file } " ) NUM_BYTES = $( wc -c < " ${ tmp_batch_input_file } " ) DISPLAY_NAME = BatchInput tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -D " ${ tmp_header_file } " \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/jsonl" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ tmp_batch_input_file } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) The following example calls the BatchGenerateContent method with the input file uploaded using File API: Python # Assumes `uploaded_file` is the file object from the previous step file_batch_job = client . batches . create ( model = "gemini-2.5-flash" , src = uploaded_file . name , config = { 'display_name' : "file-upload-job-1" , }, ) print ( f "Created batch job: { file_batch_job . name } " ) JavaScript // Assumes `uploadedFile` is the file object from the previous step const fileBatchJob = await ai . batches . create ({ model : 'gemini-2.5-flash' , src : uploadedFile . name , config : { displayName : 'file-upload-job-1' , } }); console . log ( fileBatchJob ); REST # Set the File ID taken from the upload response. BATCH_INPUT_FILE = 'files/123456' curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \ -X POST \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" \ -d "{ 'batch': { 'display_name': 'my-batch-requests', 'input_config': { 'file_name': ' ${ BATCH_INPUT_FILE } ' } } }" When you create a batch job, you will get a job name returned. Use this name for monitoring the job status as well as retrieving the results once the job completes. The following is an example output that contains a job name: Created batch job from file: batches/123456789 Batch embedding support You can use the Batch API to interact with the Embeddings model for higher throughput. To create an embeddings batch job with either inline requests or input files , use the batches.create_embeddings API and specify the embeddings model. Python # Creating an embeddings batch job with an input file request: file_job = client . batches . create_embeddings ( model = "gemini-embedding-001" , src = { 'file_name' : uploaded_batch_requests . name }, config = { 'display_name' : "Input embeddings batch" }, ) # Creating an embeddings batch job with an inline request: batch_job = client . batches . create_embeddings ( model = "gemini-embedding-001" , # For a predefined list of requests `inlined_requests` src = { 'inlined_requests' : inlined_requests }, config = { 'display_name' : "Inlined embeddings batch" }, ) JavaScript // Creating an embeddings batch job with an input file request: let fileJob ; fileJob = await client . batches . createEmbeddings ({ model : 'gemini-embedding-001' , src : { fileName : uploadedBatchRequests . name }, config : { displayName : 'Input embeddings batch' }, }); console . log ( `Created batch job: ${ fileJob . name } ` ); // Creating an embeddings batch job with an inline request: let batchJob ; batchJob = await client . batches . createEmbeddings ({ model : 'gemini-embedding-001' , // For a predefined a list of requests `inlinedRequests` src : { inlinedRequests : inlinedRequests }, config : { displayName : 'Inlined embeddings batch' }, }); console . log ( `Created batch job: ${ batchJob . name } ` ); Read the Embeddings section in the Batch API cookbook for more examples. Request configuration You can include any request configurations you would use in a standard non-batch request. For example, you could specify the temperature, system instructions or even pass in other modalities. The following example shows an example inline request that contains a system instruction for one of the requests: Python inline_requests_list = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Write a short poem about a cloud.' }]}]}, { 'contents' : [{ 'parts' : [{ 'text' : 'Write a short poem about a cat.' }]}], 'system_instructions' : { 'parts' : [{ 'text' : 'You are a cat. Your name is Neko.' }]}} ] JavaScript inlineRequestsList = [ { contents : [{ parts : [{ text : 'Write a short poem about a cloud.' }]}]}, { contents : [{ parts : [{ text : 'Write a short poem about a cat.' }]}], systemInstructions : { parts : [{ text : 'You are a cat. Your name is Neko.' }]}} ] Similarly can specify tools to use for a request. The following example shows a request that enables the Google Search tool : Python inline_requests_list = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Who won the euro 1998?' }]}]}, { 'contents' : [{ 'parts' : [{ 'text' : 'Who won the euro 2025?' }]}], 'tools' : [{ 'google_search ' : {}}]} ] JavaScript inlineRequestsList = [ { contents : [{ parts : [{ text : 'Who won the euro 1998?' }]}]}, { contents : [{ parts : [{ text : 'Who won the euro 2025?' }]}], tools : [{ googleSearch : {}}]} ] You can specify structured output as well. The following example shows how to specify for your batch requests. Python from google import genai from pydantic import BaseModel , TypeAdapter class Recipe ( BaseModel ): recipe_name : str ingredients : list [ str ] client = genai . Client () # A list of dictionaries, where each is a GenerateContentRequest inline_requests = [ { 'contents' : [{ 'parts' : [{ 'text' : 'List a few popular cookie recipes, and include the amounts of ingredients.' }], 'role' : 'user' }], 'config' : { 'response_mime_type' : 'application/json' , 'response_schema' : list [ Recipe ] } }, { 'contents' : [{ 'parts' : [{ 'text' : 'List a few popular gluten free cookie recipes, and include the amounts of ingredients.' }], 'role' : 'user' }], 'config' : { 'response_mime_type' : 'application/json' , 'response_schema' : list [ Recipe ] } } ] inline_batch_job = client . batches . create ( model = "models/gemini-2.5-flash" , src = inline_requests , config = { 'display_name' : "structured-output-job-1" }, ) # wait for the job to finish job_name = inline_batch_job . name print ( f "Polling status for job: { job_name } " ) while True : batch_job_inline = client . batches . get ( name = job_name ) if batch_job_inline . state . name in ( 'JOB_STATE_SUCCEEDED' , 'JOB_STATE_FAILED' , 'JOB_STATE_CANCELLED' , 'JOB_STATE_EXPIRED' ): break print ( f "Job not finished. Current state: { batch_job_inline . state . name } . Waiting 30 seconds..." ) time . sleep ( 30 ) print ( f "Job finished with state: { batch_job_inline . state . name } " ) # print the response for i , inline_response in enumerate ( batch_job_inline . dest . inlined_responses , start = 1 ): print ( f " \n --- Response { i } ---" ) # Check for a successful response if inline_response . response : # The .text property is a shortcut to the generated text. print ( inline_response . response . text ) JavaScript import { GoogleGenAI , Type } from '@google/genai' ; const GEMINI_API_KEY = process . env . GEMINI_API_KEY ; const ai = new GoogleGenAI ({ apiKey : GEMINI_API_KEY }); const inlinedRequests = [ { contents : [{ parts : [{ text : 'List a few popular cookie recipes, and include the amounts of ingredients.' }], role : 'user' }], config : { responseMimeType : 'application/json' , responseSchema : { type : Type . ARRAY , items : { type : Type . OBJECT , properties : { 'recipeName' : { type : Type . STRING , description : 'Name of the recipe' , nullable : false , }, 'ingredients' : { type : Type . ARRAY , items : { type : Type . STRING , description : 'Ingredients of the recipe' , nullable : false , }, }, }, required : [ 'recipeName' ], }, }, } }, { contents : [{ parts : [{ text : 'List a few popular gluten free cookie recipes, and include the amounts of ingredients.' }], role : 'user' }], config : { responseMimeType : 'application/json' , responseSchema : { type : Type . ARRAY , items : { type : Type . OBJECT , properties : { 'recipeName' : { type : Type . STRING , description : 'Name of the recipe' , nullable : false , }, 'ingredients' : { type : Type . ARRAY , items : { type : Type . STRING , description : 'Ingredients of the recipe' , nullable : false , }, }, }, required : [ 'recipeName' ], }, }, } } ] const inlinedBatchJob = await ai . batches . create ({ model : 'gemini-2.5-flash' , src : inlinedRequests , config : { displayName : 'inlined-requests-job-1' , } }); Monitoring job status Use the operation name obtained when creating the batch job to poll its status. The state field of the batch job will indicate its current status. A batch job can be in one of the following states: JOB_STATE_PENDING : The job has been created and is waiting to be processed by the service. JOB_STATE_RUNNING : The job is in progress. JOB_STATE_SUCCEEDED : The job completed successfully. You can now retrieve the results. JOB_STATE_FAILED : The job failed. Check the error details for more information. JOB_STATE_CANCELLED : The job was cancelled by the user. JOB_STATE_EXPIRED : The job has expired because it was running or pending for more than 48 hours. The job will not have any results to retrieve. You can try submitting the job again or splitting up the requests into smaller batches. You can poll the job status periodically to check for completion. Python # Use the name of the job you want to check # e.g., inline_batch_job.name from the previous step job_name = "YOUR_BATCH_JOB_NAME" # (e.g. 'batches/your-batch-id') batch_job = client . batches . get ( name = job_name ) completed_states = set ([ 'JOB_STATE_SUCCEEDED' , 'JOB_STATE_FAILED' , 'JOB_STATE_CANCELLED' , 'JOB_STATE_EXPIRED' , ]) print ( f "Polling status for job: { job_name } " ) batch_job = client . batches . get ( name = job_name ) # Initial get while batch_job . state . name not in completed_states : print ( f "Current state: { batch_job . state . name } " ) time . sleep ( 30 ) # Wait for 30 seconds before polling again batch_job = client . batches . get ( name = job_name ) print ( f "Job finished with state: { batch_job . state . name } " ) if batch_job . state . name == 'JOB_STATE_FAILED' : print ( f "Error: { batch_job . error } " ) JavaScript // Use the name of the job you want to check // e.g., inlinedBatchJob.name from the previous step let batchJob ; const completedStates = new Set ([ 'JOB_STATE_SUCCEEDED' , 'JOB_STATE_FAILED' , 'JOB_STATE_CANCELLED' , 'JOB_STATE_EXPIRED' , ]); try { batchJob = await ai . batches . get ({ name : inlinedBatchJob . name }); while ( ! completedStates . has ( batchJob . state )) { console . log ( `Current state: ${ batchJob . state } ` ); // Wait for 30 seconds before polling again await new Promise ( resolve = > setTimeout ( resolve , 30000 )); batchJob = await client . batches . get ({ name : batchJob . name }); } console . log ( `Job finished with state: ${ batchJob . state } ` ); if ( batchJob . state === 'JOB_STATE_FAILED' ) { // The exact structure of `error` might vary depending on the SDK // This assumes `error` is an object with a `message` property. console . error ( `Error: ${ batchJob . state } ` ); } } catch ( error ) { console . error ( `An error occurred while polling job ${ batchJob . name } :` , error ); } Retrieving results Once the job status indicates your batch job has succeeded, the results are available in the response field. Python import json # Use the name of the job you want to check # e.g., inline_batch_job.name from the previous step job_name = "YOUR_BATCH_JOB_NAME" batch_job = client . batches . get ( name = job_name ) if batch_job . state . name == 'JOB_STATE_SUCCEEDED' : # If batch job was created with a file if batch_job . dest and batch_job . dest . file_name : # Results are in a file result_file_name = batch_job . dest . file_name print ( f "Results are in file: { result_file_name } " ) print ( "Downloading result file content..." ) file_content = client . files . download ( file = result_file_name ) # Process file_content (bytes) as needed print ( file_content . decode ( 'utf-8' )) # If batch job was created with inline request # (for embeddings, use batch_job.dest.inlined_embed_content_responses) elif batch_job . dest and batch_job . dest . inlined_responses : # Results are inline print ( "Results are inline:" ) for i , inline_response in enumerate ( batch_job . dest . inlined_responses ): print ( f "Response { i + 1 } :" ) if inline_response . response : # Accessing response, structure may vary. try : print ( inline_response . response . text ) except AttributeError : print ( inline_response . response ) # Fallback elif inline_response . error : print ( f "Error: { inline_response . error } " ) else : print ( "No results found (neither file nor inline)." ) else : print ( f "Job did not succeed. Final state: { batch_job . state . name } " ) if batch_job . error : print ( f "Error: { batch_job . error } " ) JavaScript // Use the name of the job you want to check // e.g., inlinedBatchJob.name from the previous step const jobName = "YOUR_BATCH_JOB_NAME" ; try { const batchJob = await ai . batches . get ({ name : jobName }); if ( batchJob . state === 'JOB_STATE_SUCCEEDED' ) { console . log ( 'Found completed batch:' , batchJob . displayName ); console . log ( batchJob ); // If batch job was created with a file destination if ( batchJob . dest ? . fileName ) { const resultFileName = batchJob . dest . fileName ; console . log ( `Results are in file: ${ resultFileName } ` ); console . log ( "Downloading result file content..." ); const fileContentBuffer = await ai . files . download ({ file : resultFileName }); // Process fileContentBuffer (Buffer) as needed console . log ( fileContentBuffer . toString ( 'utf-8' )); } // If batch job was created with inline responses else if ( batchJob . dest ? . inlinedResponses ) { console . log ( "Results are inline:" ); for ( let i = 0 ; i < batchJob . dest . inlinedResponses . length ; i ++ ) { const inlineResponse = batchJob . dest . inlinedResponses [ i ]; console . log ( `Response ${ i + 1 } :` ); if ( inlineResponse . response ) { // Accessing response, structure may vary. if ( inlineResponse . response . text !== undefined ) { console . log ( inlineResponse . response . text ); } else { console . log ( inlineResponse . response ); // Fallback } } else if ( inlineResponse . error ) { console . error ( `Error: ${ inlineResponse . error } ` ); } } } // If batch job was an embedding batch with inline responses else if ( batchJob . dest ? . inlinedEmbedContentResponses ) { console . log ( "Embedding results found inline:" ); for ( let i = 0 ; i < batchJob . dest . inlinedEmbedContentResponses . length ; i ++ ) { const inlineResponse = batchJob . dest . inlinedEmbedContentResponses [ i ]; console . log ( `Response ${ i + 1 } :` ); if ( inlineResponse . response ) { console . log ( inlineResponse . response ); } else if ( inlineResponse . error ) { console . error ( `Error: ${ inlineResponse . error } ` ); } } } else { console . log ( "No results found (neither file nor inline)." ); } } else { console . log ( `Job did not succeed. Final state: ${ batchJob . state } ` ); if ( batchJob . error ) { console . error ( `Error: ${ typeof batchJob . error === 'string' ? batchJob . error : batchJob . error . message || JSON . stringify ( batchJob . error ) } ` ); } } } catch ( error ) { console . error ( `An error occurred while processing job ${ jobName } :` , error ); } REST BATCH_NAME = "batches/123456" # Your batch job name curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" 2 > /dev/null > batch_status.json if jq -r '.done' batch_status.json | grep -q "false" ; then echo "Batch has not finished processing" fi batch_state = $( jq -r '.metadata.state' batch_status.json ) if [[ $batch_state = "JOB_STATE_SUCCEEDED" ]] ; then if [[ $( jq '.response | has("inlinedResponses")' batch_status.json ) = "true" ]] ; then jq -r '.response.inlinedResponses' batch_status.json exit fi responses_file_name = $( jq -r '.response.responsesFile' batch_status.json ) curl https://generativelanguage.googleapis.com/download/v1beta/ $responses_file_name :download?alt = media \ -H "x-goog-api-key: $GEMINI_API_KEY " 2 > /dev/null elif [[ $batch_state = "JOB_STATE_FAILED" ]] ; then jq '.error' batch_status.json elif [[ $batch_state == "JOB_STATE_CANCELLED" ]] ; then echo "Batch was cancelled by the user" elif [[ $batch_state == "JOB_STATE_EXPIRED" ]] ; then echo "Batch expired after 48 hours" fi Cancelling a batch job You can cancel an ongoing batch job using its name. When a job is canceled, it stops processing new requests. Python # Cancel a batch job client . batches . cancel ( name = batch_job_to_cancel . name ) JavaScript await ai . batches . cancel ({ name : batchJobToCancel . name }); REST BATCH_NAME = "batches/123456" # Your batch job name # Cancel the batch curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME :cancel \ -H "x-goog-api-key: $GEMINI_API_KEY " \ # Confirm that the status of the batch after cancellation is JOB_STATE_CANCELLED curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" 2 > /dev/null | jq -r '.metadata.state' Deleting a batch job You can delete an existing batch job using its name. When a job is deleted, it stops processing new requests and is removed from the list of batch jobs. Python # Delete a batch job client . batches . delete ( name = batch_job_to_delete . name ) JavaScript await ai . batches . delete ({ name : batchJobToDelete . name }); REST BATCH_NAME = "batches/123456" # Your batch job name # Delete the batch job curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME :delete \ -H "x-goog-api-key: $GEMINI_API_KEY " Technical details Supported models: Batch API supports a range of Gemini models. Refer to the Models page for each model's support of Batch API. The supported modalities for Batch API are the same as what's supported on the interactive (or non-batch) API. Pricing: Batch API usage is priced at 50% of the standard interactive API cost for the equivalent model. See the pricing page for details. Refer to the rate limits page for details on rate limits for this feature. Service Level Objective (SLO): Batch jobs are designed to complete within a 24-hour turnaround time. Many jobs may complete much faster depending on their size and current system load. Caching: Context caching is enabled for batch requests. If a request in your batch results in a cache hit, the cached tokens are priced the same as for non-batch API traffic. Best practices Use input files for large requests: For a large number of requests, always use the file input method for better manageability and to avoid hitting request size limits for the BatchGenerateContent call itself. Note that there's a the 2GB file size limit per input file. Error handling: Check the batchStats for failedRequestCount after a job completes. If using file output, parse each line to check if it's a GenerateContentResponse or a status object indicating an error for that specific request. See the troubleshooting guide for a complete set of error codes. Submit jobs once: The creation of a batch job is not idempotent. If you send the same creation request twice, two separate batch jobs will be created. Break up very large batches: While the target turnaround time is 24 hours, actual processing time can vary based on system load and job size. For large jobs, consider breaking them into smaller batches if intermediate results are needed sooner. What's next Check out the Batch API notebook for more examples. The OpenAI compatibility layer supports Batch API. Read the examples on the OpenAI Compatibility page. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-09-12 UTC.
text_content/docs_batch-api_cbe479b9.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ URL: https://ai.google.dev/gemini-api/docs/batch-api#inline-requests
2
+ Title: Batch API  |  Gemini API  |  Google AI for Developers
3
+ ==================================================
4
+
5
+ Batch API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Batch API The Gemini Batch API is designed to process large volumes of requests asynchronously at 50% of the standard cost . The target turnaround time is 24 hours, but in majority of cases, it is much quicker. Use Batch API for large-scale, non-urgent tasks such as data pre-processing or running evaluations where an immediate response is not required. Creating a batch job You have two ways to submit your requests in Batch API: Inline Requests : A list of GenerateContentRequest objects directly included in your batch creation request. This is suitable for smaller batches that keep the total request size under 20MB. The output returned from the model is a list of inlineResponse objects. Input File : A JSON Lines (JSONL) file where each line contains a complete GenerateContentRequest object. This method is recommended for larger requests. The output returned from the model is a JSONL file where each line is either a GenerateContentResponse or a status object. Inline requests For a small number of requests, you can directly embed the GenerateContentRequest objects within your BatchGenerateContentRequest . The following example calls the BatchGenerateContent method with inline requests: Python from google import genai from google.genai import types client = genai . Client () # A list of dictionaries, where each is a GenerateContentRequest inline_requests = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Tell me a one-sentence joke.' }], 'role' : 'user' }] }, { 'contents' : [{ 'parts' : [{ 'text' : 'Why is the sky blue?' }], 'role' : 'user' }] } ] inline_batch_job = client . batches . create ( model = "models/gemini-2.5-flash" , src = inline_requests , config = { 'display_name' : "inlined-requests-job-1" , }, ) print ( f "Created batch job: { inline_batch_job . name } " ) JavaScript import { GoogleGenAI } from '@google/genai' ; const GEMINI_API_KEY = process . env . GEMINI_API_KEY ; const ai = new GoogleGenAI ({ apiKey : GEMINI_API_KEY }); const inlinedRequests = [ { contents : [{ parts : [{ text : 'Tell me a one-sentence joke.' }], role : 'user' }] }, { contents : [{ parts : [{ 'text' : 'Why is the sky blue?' }], role : 'user' }] } ] const response = await ai . batches . create ({ model : 'gemini-2.5-flash' , src : inlinedRequests , config : { displayName : 'inlined-requests-job-1' , } }); console . log ( response ); REST curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -X POST \ -H "Content-Type:application/json" \ -d '{ "batch": { "display_name": "my-batch-requests", "input_config": { "requests": { "requests": [ { "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]}, "metadata": { "key": "request-1" } }, { "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]}, "metadata": { "key": "request-2" } } ] } } } }' Input file For larger sets of requests, prepare a JSON Lines (JSONL) file. Each line in this file must be a JSON object containing a user-defined key and a request object, where the request is a valid GenerateContentRequest object. The user-defined key is used in the response to indicate which output is the result of which request. For example, the request with the key defined as request-1 will have its response annotated with the same key name. This file is uploaded using the File API . The maximum allowed file size for an input file is 2GB. The following is an example of a JSONL file. You can save it in a file named my-batch-requests.json : { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }]}], "generation_config" : { "temperature" : 0.7 }}} { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }]}]}} Similarly to inline requests, you can specify other parameters like system instructions, tools or other configurations in each request JSON. You can upload this file using the File API as shown in the following example. If you are working with multimodal input, you can reference other uploaded files within your JSONL file. Python from google import genai from google.genai import types client = genai . Client () # Create a sample JSONL file with open ( "my-batch-requests.jsonl" , "w" ) as f : requests = [ { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }]}]}}, { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }]}]}} ] for req in requests : f . write ( json . dumps ( req ) + " \n " ) # Upload the file to the File API uploaded_file = client . files . upload ( file = 'my-batch-requests.jsonl' , config = types . UploadFileConfig ( display_name = 'my-batch-requests' , mime_type = 'jsonl' ) ) print ( f "Uploaded file: { uploaded_file . name } " ) JavaScript import { GoogleGenAI } from '@google/genai' ; import * as fs from "fs" ; import * as path from "path" ; import { fileURLToPath } from 'url' ; const GEMINI_API_KEY = process . env . GEMINI_API_KEY ; const ai = new GoogleGenAI ({ apiKey : GEMINI_API_KEY }); const fileName = "my-batch-requests.jsonl" ; // Define the requests const requests = [ { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }] }] } }, { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }] }] } } ]; // Construct the full path to file const __filename = fileURLToPath ( import . meta . url ); const __dirname = path . dirname ( __filename ); const filePath = path . join ( __dirname , fileName ); // __dirname is the directory of the current script async function writeBatchRequestsToFile ( requests , filePath ) { try { // Use a writable stream for efficiency, especially with larger files. const writeStream = fs . createWriteStream ( filePath , { flags : 'w' }); writeStream . on ( 'error' , ( err ) = > { console . error ( `Error writing to file ${ filePath } :` , err ); }); for ( const req of requests ) { writeStream . write ( JSON . stringify ( req ) + '\n' ); } writeStream . end (); console . log ( `Successfully wrote batch requests to ${ filePath } ` ); } catch ( error ) { // This catch block is for errors that might occur before stream setup, // stream errors are handled by the 'error' event. console . error ( `An unexpected error occurred:` , error ); } } // Write to a file. writeBatchRequestsToFile ( requests , filePath ); // Upload the file to the File API. const uploadedFile = await ai . files . upload ({ file : 'my-batch-requests.jsonl' , config : { mimeType : 'jsonl' , }}); console . log ( uploadedFile . name ); REST tmp_batch_input_file = batch_input.tmp echo -e '{"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}], "generationConfig": {"temperature": 0.7}}\n{"contents": [{"parts": [{"text": "What are the main ingredients in a Margherita pizza?"}]}]}' > batch_input.tmp MIME_TYPE = $( file -b --mime-type " ${ tmp_batch_input_file } " ) NUM_BYTES = $( wc -c < " ${ tmp_batch_input_file } " ) DISPLAY_NAME = BatchInput tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -D " ${ tmp_header_file } " \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/jsonl" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ tmp_batch_input_file } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) The following example calls the BatchGenerateContent method with the input file uploaded using File API: Python # Assumes `uploaded_file` is the file object from the previous step file_batch_job = client . batches . create ( model = "gemini-2.5-flash" , src = uploaded_file . name , config = { 'display_name' : "file-upload-job-1" , }, ) print ( f "Created batch job: { file_batch_job . name } " ) JavaScript // Assumes `uploadedFile` is the file object from the previous step const fileBatchJob = await ai . batches . create ({ model : 'gemini-2.5-flash' , src : uploadedFile . name , config : { displayName : 'file-upload-job-1' , } }); console . log ( fileBatchJob ); REST # Set the File ID taken from the upload response. BATCH_INPUT_FILE = 'files/123456' curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \ -X POST \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" \ -d "{ 'batch': { 'display_name': 'my-batch-requests', 'input_config': { 'file_name': ' ${ BATCH_INPUT_FILE } ' } } }" When you create a batch job, you will get a job name returned. Use this name for monitoring the job status as well as retrieving the results once the job completes. The following is an example output that contains a job name: Created batch job from file: batches/123456789 Batch embedding support You can use the Batch API to interact with the Embeddings model for higher throughput. To create an embeddings batch job with either inline requests or input files , use the batches.create_embeddings API and specify the embeddings model. Python # Creating an embeddings batch job with an input file request: file_job = client . batches . create_embeddings ( model = "gemini-embedding-001" , src = { 'file_name' : uploaded_batch_requests . name }, config = { 'display_name' : "Input embeddings batch" }, ) # Creating an embeddings batch job with an inline request: batch_job = client . batches . create_embeddings ( model = "gemini-embedding-001" , # For a predefined list of requests `inlined_requests` src = { 'inlined_requests' : inlined_requests }, config = { 'display_name' : "Inlined embeddings batch" }, ) JavaScript // Creating an embeddings batch job with an input file request: let fileJob ; fileJob = await client . batches . createEmbeddings ({ model : 'gemini-embedding-001' , src : { fileName : uploadedBatchRequests . name }, config : { displayName : 'Input embeddings batch' }, }); console . log ( `Created batch job: ${ fileJob . name } ` ); // Creating an embeddings batch job with an inline request: let batchJob ; batchJob = await client . batches . createEmbeddings ({ model : 'gemini-embedding-001' , // For a predefined a list of requests `inlinedRequests` src : { inlinedRequests : inlinedRequests }, config : { displayName : 'Inlined embeddings batch' }, }); console . log ( `Created batch job: ${ batchJob . name } ` ); Read the Embeddings section in the Batch API cookbook for more examples. Request configuration You can include any request configurations you would use in a standard non-batch request. For example, you could specify the temperature, system instructions or even pass in other modalities. The following example shows an example inline request that contains a system instruction for one of the requests: Python inline_requests_list = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Write a short poem about a cloud.' }]}]}, { 'contents' : [{ 'parts' : [{ 'text' : 'Write a short poem about a cat.' }]}], 'system_instructions' : { 'parts' : [{ 'text' : 'You are a cat. Your name is Neko.' }]}} ] JavaScript inlineRequestsList = [ { contents : [{ parts : [{ text : 'Write a short poem about a cloud.' }]}]}, { contents : [{ parts : [{ text : 'Write a short poem about a cat.' }]}], systemInstructions : { parts : [{ text : 'You are a cat. Your name is Neko.' }]}} ] Similarly can specify tools to use for a request. The following example shows a request that enables the Google Search tool : Python inline_requests_list = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Who won the euro 1998?' }]}]}, { 'contents' : [{ 'parts' : [{ 'text' : 'Who won the euro 2025?' }]}], 'tools' : [{ 'google_search ' : {}}]} ] JavaScript inlineRequestsList = [ { contents : [{ parts : [{ text : 'Who won the euro 1998?' }]}]}, { contents : [{ parts : [{ text : 'Who won the euro 2025?' }]}], tools : [{ googleSearch : {}}]} ] You can specify structured output as well. The following example shows how to specify for your batch requests. Python from google import genai from pydantic import BaseModel , TypeAdapter class Recipe ( BaseModel ): recipe_name : str ingredients : list [ str ] client = genai . Client () # A list of dictionaries, where each is a GenerateContentRequest inline_requests = [ { 'contents' : [{ 'parts' : [{ 'text' : 'List a few popular cookie recipes, and include the amounts of ingredients.' }], 'role' : 'user' }], 'config' : { 'response_mime_type' : 'application/json' , 'response_schema' : list [ Recipe ] } }, { 'contents' : [{ 'parts' : [{ 'text' : 'List a few popular gluten free cookie recipes, and include the amounts of ingredients.' }], 'role' : 'user' }], 'config' : { 'response_mime_type' : 'application/json' , 'response_schema' : list [ Recipe ] } } ] inline_batch_job = client . batches . create ( model = "models/gemini-2.5-flash" , src = inline_requests , config = { 'display_name' : "structured-output-job-1" }, ) # wait for the job to finish job_name = inline_batch_job . name print ( f "Polling status for job: { job_name } " ) while True : batch_job_inline = client . batches . get ( name = job_name ) if batch_job_inline . state . name in ( 'JOB_STATE_SUCCEEDED' , 'JOB_STATE_FAILED' , 'JOB_STATE_CANCELLED' , 'JOB_STATE_EXPIRED' ): break print ( f "Job not finished. Current state: { batch_job_inline . state . name } . Waiting 30 seconds..." ) time . sleep ( 30 ) print ( f "Job finished with state: { batch_job_inline . state . name } " ) # print the response for i , inline_response in enumerate ( batch_job_inline . dest . inlined_responses , start = 1 ): print ( f " \n --- Response { i } ---" ) # Check for a successful response if inline_response . response : # The .text property is a shortcut to the generated text. print ( inline_response . response . text ) JavaScript import { GoogleGenAI , Type } from '@google/genai' ; const GEMINI_API_KEY = process . env . GEMINI_API_KEY ; const ai = new GoogleGenAI ({ apiKey : GEMINI_API_KEY }); const inlinedRequests = [ { contents : [{ parts : [{ text : 'List a few popular cookie recipes, and include the amounts of ingredients.' }], role : 'user' }], config : { responseMimeType : 'application/json' , responseSchema : { type : Type . ARRAY , items : { type : Type . OBJECT , properties : { 'recipeName' : { type : Type . STRING , description : 'Name of the recipe' , nullable : false , }, 'ingredients' : { type : Type . ARRAY , items : { type : Type . STRING , description : 'Ingredients of the recipe' , nullable : false , }, }, }, required : [ 'recipeName' ], }, }, } }, { contents : [{ parts : [{ text : 'List a few popular gluten free cookie recipes, and include the amounts of ingredients.' }], role : 'user' }], config : { responseMimeType : 'application/json' , responseSchema : { type : Type . ARRAY , items : { type : Type . OBJECT , properties : { 'recipeName' : { type : Type . STRING , description : 'Name of the recipe' , nullable : false , }, 'ingredients' : { type : Type . ARRAY , items : { type : Type . STRING , description : 'Ingredients of the recipe' , nullable : false , }, }, }, required : [ 'recipeName' ], }, }, } } ] const inlinedBatchJob = await ai . batches . create ({ model : 'gemini-2.5-flash' , src : inlinedRequests , config : { displayName : 'inlined-requests-job-1' , } }); Monitoring job status Use the operation name obtained when creating the batch job to poll its status. The state field of the batch job will indicate its current status. A batch job can be in one of the following states: JOB_STATE_PENDING : The job has been created and is waiting to be processed by the service. JOB_STATE_RUNNING : The job is in progress. JOB_STATE_SUCCEEDED : The job completed successfully. You can now retrieve the results. JOB_STATE_FAILED : The job failed. Check the error details for more information. JOB_STATE_CANCELLED : The job was cancelled by the user. JOB_STATE_EXPIRED : The job has expired because it was running or pending for more than 48 hours. The job will not have any results to retrieve. You can try submitting the job again or splitting up the requests into smaller batches. You can poll the job status periodically to check for completion. Python # Use the name of the job you want to check # e.g., inline_batch_job.name from the previous step job_name = "YOUR_BATCH_JOB_NAME" # (e.g. 'batches/your-batch-id') batch_job = client . batches . get ( name = job_name ) completed_states = set ([ 'JOB_STATE_SUCCEEDED' , 'JOB_STATE_FAILED' , 'JOB_STATE_CANCELLED' , 'JOB_STATE_EXPIRED' , ]) print ( f "Polling status for job: { job_name } " ) batch_job = client . batches . get ( name = job_name ) # Initial get while batch_job . state . name not in completed_states : print ( f "Current state: { batch_job . state . name } " ) time . sleep ( 30 ) # Wait for 30 seconds before polling again batch_job = client . batches . get ( name = job_name ) print ( f "Job finished with state: { batch_job . state . name } " ) if batch_job . state . name == 'JOB_STATE_FAILED' : print ( f "Error: { batch_job . error } " ) JavaScript // Use the name of the job you want to check // e.g., inlinedBatchJob.name from the previous step let batchJob ; const completedStates = new Set ([ 'JOB_STATE_SUCCEEDED' , 'JOB_STATE_FAILED' , 'JOB_STATE_CANCELLED' , 'JOB_STATE_EXPIRED' , ]); try { batchJob = await ai . batches . get ({ name : inlinedBatchJob . name }); while ( ! completedStates . has ( batchJob . state )) { console . log ( `Current state: ${ batchJob . state } ` ); // Wait for 30 seconds before polling again await new Promise ( resolve = > setTimeout ( resolve , 30000 )); batchJob = await client . batches . get ({ name : batchJob . name }); } console . log ( `Job finished with state: ${ batchJob . state } ` ); if ( batchJob . state === 'JOB_STATE_FAILED' ) { // The exact structure of `error` might vary depending on the SDK // This assumes `error` is an object with a `message` property. console . error ( `Error: ${ batchJob . state } ` ); } } catch ( error ) { console . error ( `An error occurred while polling job ${ batchJob . name } :` , error ); } Retrieving results Once the job status indicates your batch job has succeeded, the results are available in the response field. Python import json # Use the name of the job you want to check # e.g., inline_batch_job.name from the previous step job_name = "YOUR_BATCH_JOB_NAME" batch_job = client . batches . get ( name = job_name ) if batch_job . state . name == 'JOB_STATE_SUCCEEDED' : # If batch job was created with a file if batch_job . dest and batch_job . dest . file_name : # Results are in a file result_file_name = batch_job . dest . file_name print ( f "Results are in file: { result_file_name } " ) print ( "Downloading result file content..." ) file_content = client . files . download ( file = result_file_name ) # Process file_content (bytes) as needed print ( file_content . decode ( 'utf-8' )) # If batch job was created with inline request # (for embeddings, use batch_job.dest.inlined_embed_content_responses) elif batch_job . dest and batch_job . dest . inlined_responses : # Results are inline print ( "Results are inline:" ) for i , inline_response in enumerate ( batch_job . dest . inlined_responses ): print ( f "Response { i + 1 } :" ) if inline_response . response : # Accessing response, structure may vary. try : print ( inline_response . response . text ) except AttributeError : print ( inline_response . response ) # Fallback elif inline_response . error : print ( f "Error: { inline_response . error } " ) else : print ( "No results found (neither file nor inline)." ) else : print ( f "Job did not succeed. Final state: { batch_job . state . name } " ) if batch_job . error : print ( f "Error: { batch_job . error } " ) JavaScript // Use the name of the job you want to check // e.g., inlinedBatchJob.name from the previous step const jobName = "YOUR_BATCH_JOB_NAME" ; try { const batchJob = await ai . batches . get ({ name : jobName }); if ( batchJob . state === 'JOB_STATE_SUCCEEDED' ) { console . log ( 'Found completed batch:' , batchJob . displayName ); console . log ( batchJob ); // If batch job was created with a file destination if ( batchJob . dest ? . fileName ) { const resultFileName = batchJob . dest . fileName ; console . log ( `Results are in file: ${ resultFileName } ` ); console . log ( "Downloading result file content..." ); const fileContentBuffer = await ai . files . download ({ file : resultFileName }); // Process fileContentBuffer (Buffer) as needed console . log ( fileContentBuffer . toString ( 'utf-8' )); } // If batch job was created with inline responses else if ( batchJob . dest ? . inlinedResponses ) { console . log ( "Results are inline:" ); for ( let i = 0 ; i < batchJob . dest . inlinedResponses . length ; i ++ ) { const inlineResponse = batchJob . dest . inlinedResponses [ i ]; console . log ( `Response ${ i + 1 } :` ); if ( inlineResponse . response ) { // Accessing response, structure may vary. if ( inlineResponse . response . text !== undefined ) { console . log ( inlineResponse . response . text ); } else { console . log ( inlineResponse . response ); // Fallback } } else if ( inlineResponse . error ) { console . error ( `Error: ${ inlineResponse . error } ` ); } } } // If batch job was an embedding batch with inline responses else if ( batchJob . dest ? . inlinedEmbedContentResponses ) { console . log ( "Embedding results found inline:" ); for ( let i = 0 ; i < batchJob . dest . inlinedEmbedContentResponses . length ; i ++ ) { const inlineResponse = batchJob . dest . inlinedEmbedContentResponses [ i ]; console . log ( `Response ${ i + 1 } :` ); if ( inlineResponse . response ) { console . log ( inlineResponse . response ); } else if ( inlineResponse . error ) { console . error ( `Error: ${ inlineResponse . error } ` ); } } } else { console . log ( "No results found (neither file nor inline)." ); } } else { console . log ( `Job did not succeed. Final state: ${ batchJob . state } ` ); if ( batchJob . error ) { console . error ( `Error: ${ typeof batchJob . error === 'string' ? batchJob . error : batchJob . error . message || JSON . stringify ( batchJob . error ) } ` ); } } } catch ( error ) { console . error ( `An error occurred while processing job ${ jobName } :` , error ); } REST BATCH_NAME = "batches/123456" # Your batch job name curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" 2 > /dev/null > batch_status.json if jq -r '.done' batch_status.json | grep -q "false" ; then echo "Batch has not finished processing" fi batch_state = $( jq -r '.metadata.state' batch_status.json ) if [[ $batch_state = "JOB_STATE_SUCCEEDED" ]] ; then if [[ $( jq '.response | has("inlinedResponses")' batch_status.json ) = "true" ]] ; then jq -r '.response.inlinedResponses' batch_status.json exit fi responses_file_name = $( jq -r '.response.responsesFile' batch_status.json ) curl https://generativelanguage.googleapis.com/download/v1beta/ $responses_file_name :download?alt = media \ -H "x-goog-api-key: $GEMINI_API_KEY " 2 > /dev/null elif [[ $batch_state = "JOB_STATE_FAILED" ]] ; then jq '.error' batch_status.json elif [[ $batch_state == "JOB_STATE_CANCELLED" ]] ; then echo "Batch was cancelled by the user" elif [[ $batch_state == "JOB_STATE_EXPIRED" ]] ; then echo "Batch expired after 48 hours" fi Cancelling a batch job You can cancel an ongoing batch job using its name. When a job is canceled, it stops processing new requests. Python # Cancel a batch job client . batches . cancel ( name = batch_job_to_cancel . name ) JavaScript await ai . batches . cancel ({ name : batchJobToCancel . name }); REST BATCH_NAME = "batches/123456" # Your batch job name # Cancel the batch curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME :cancel \ -H "x-goog-api-key: $GEMINI_API_KEY " \ # Confirm that the status of the batch after cancellation is JOB_STATE_CANCELLED curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" 2 > /dev/null | jq -r '.metadata.state' Deleting a batch job You can delete an existing batch job using its name. When a job is deleted, it stops processing new requests and is removed from the list of batch jobs. Python # Delete a batch job client . batches . delete ( name = batch_job_to_delete . name ) JavaScript await ai . batches . delete ({ name : batchJobToDelete . name }); REST BATCH_NAME = "batches/123456" # Your batch job name # Delete the batch job curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME :delete \ -H "x-goog-api-key: $GEMINI_API_KEY " Technical details Supported models: Batch API supports a range of Gemini models. Refer to the Models page for each model's support of Batch API. The supported modalities for Batch API are the same as what's supported on the interactive (or non-batch) API. Pricing: Batch API usage is priced at 50% of the standard interactive API cost for the equivalent model. See the pricing page for details. Refer to the rate limits page for details on rate limits for this feature. Service Level Objective (SLO): Batch jobs are designed to complete within a 24-hour turnaround time. Many jobs may complete much faster depending on their size and current system load. Caching: Context caching is enabled for batch requests. If a request in your batch results in a cache hit, the cached tokens are priced the same as for non-batch API traffic. Best practices Use input files for large requests: For a large number of requests, always use the file input method for better manageability and to avoid hitting request size limits for the BatchGenerateContent call itself. Note that there's a the 2GB file size limit per input file. Error handling: Check the batchStats for failedRequestCount after a job completes. If using file output, parse each line to check if it's a GenerateContentResponse or a status object indicating an error for that specific request. See the troubleshooting guide for a complete set of error codes. Submit jobs once: The creation of a batch job is not idempotent. If you send the same creation request twice, two separate batch jobs will be created. Break up very large batches: While the target turnaround time is 24 hours, actual processing time can vary based on system load and job size. For large jobs, consider breaking them into smaller batches if intermediate results are needed sooner. What's next Check out the Batch API notebook for more examples. The OpenAI compatibility layer supports Batch API. Read the examples on the OpenAI Compatibility page. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-09-12 UTC.
text_content/docs_batch-mode_27b99003.txt CHANGED
@@ -1,5 +1,5 @@
1
  URL: https://ai.google.dev/gemini-api/docs/batch-mode#batch-job-status
2
- Title: Batch Mode  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Batch Mode | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Batch Mode The Gemini API's Batch Mode is designed to process large volumes of requests asynchronously at 50% of the standard cost . The target turnaround time is 24 hours, but in majority of cases, it is much quicker. Use Batch Mode for large-scale, non-urgent tasks such as data pre-processing or running evaluations where an immediate response is not required. Note: You can use Batch Mode with the Gemini API Python SDK or the REST API. Support for Batch Mode in the Gemini API JavaScript SDK is coming soon. Getting Started This section helps you get started with submitting your first requests in batch mode. Creating a batch job You have two ways to submit your requests in Batch Mode: Inline Requests : A list of GenerateContentRequest objects directly included in your batch creation request. This is suitable for smaller batches that keep the total request size under 20MB. The output returned from the model is a list of inlineResponse objects. Input File : A JSON Lines (JSONL) file where each line contains a complete GenerateContentRequest object. This method is recommended for larger requests. The output returned from the model is a JSONL file where each line is either a GenerateContentResponse or a status object. Inline requests For a small number of requests, you can directly embed the GenerateContentRequest objects within your BatchGenerateContentRequest . The following example calls the BatchGenerateContent method with inline requests: Python from google import genai from google.genai import types client = genai . Client () # A list of dictionaries, where each is a GenerateContentRequest inline_requests = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Tell me a one-sentence joke.' }], 'role' : 'user' }] }, { 'contents' : [{ 'parts' : [{ 'text' : 'Why is the sky blue?' }], 'role' : 'user' }] } ] inline_batch_job = client . batches . create ( model = "models/gemini-2.5-flash" , src = inline_requests , config = { 'display_name' : "inlined-requests-job-1" , }, ) print ( f "Created batch job: { inline_batch_job . name } " ) REST curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -X POST \ -H "Content-Type:application/json" \ -d '{ "batch": { "display_name": "my-batch-requests", "input_config": { "requests": { "requests": [ { "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]}, "metadata": { "key": "request-1" } }, { "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]}, "metadata": { "key": "request-2" } } ] } } } }' You can use any requests you would use in non-batch (or interactive) mode. For example, you could specify the temperature, system instructions or even pass in other modalities. The following example shows some example inline requests that contain a system instruction for one of the requests: inline_requests_list = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Write a short poem about a cloud.' }]}]}, { 'contents' : [{ 'parts' : [{ 'text' : 'Write a short poem about a cat.' }]}], 'system_instructions' : { 'parts' : [{ 'text' : 'You are a cat. Your name is Neko.' }]}} ] Similarly can also specify tools to use for a request. The following example shows a request that enables the Google Search tool : inline_requests_list = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Who won the euro 1998?' }]}]}, { 'contents' : [{ 'parts' : [{ 'text' : 'Who won the euro 2025?' }]}], 'tools' : [{ 'google_search ' : {}}]} ] Input file For larger sets of requests, prepare a JSON Lines (JSONL) file. Each line in this file must be a JSON object containing a user-defined key and a request object, where the request is a valid GenerateContentRequest object. The user-defined key is used in the response to indicate which output is the result of which request. For example, the request with the key defined as request-1 will have its response annotated with the same key name. This file is uploaded using the File API . The maximum allowed file size for an input file is 2GB. The following is an example of a JSONL file. You can save it in a file named my-batch-requests.json : { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }]}], "generation_config" : { "temperature" : 0.7 }}} { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }]}]}} Similarly to inline requests, you can specify other parameters like system instructions, tools or other configurations in each request JSON. You can upload this file using the File API as shown in the following example. If you are working with multimodal input, you can reference other uploaded files within your JSONL file. Python from google import genai from google.genai import types client = genai . Client () # Create a sample JSONL file with open ( "my-batch-requests.jsonl" , "w" ) as f : requests = [ { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }]}]}}, { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }]}]}} ] for req in requests : f . write ( json . dumps ( req ) + " \n " ) # Upload the file to the File API uploaded_file = client . files . upload ( file = 'my-batch-requests.jsonl' , config = types . UploadFileConfig ( display_name = 'my-batch-requests' , mime_type = 'jsonl' ) ) print ( f "Uploaded file: { uploaded_file . name } " ) REST tmp_batch_input_file = batch_input.tmp echo -e '{"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}], "generationConfig": {"temperature": 0.7}}\n{"contents": [{"parts": [{"text": "What are the main ingredients in a Margherita pizza?"}]}]}' > batch_input.tmp MIME_TYPE = $( file -b --mime-type " ${ tmp_batch_input_file } " ) NUM_BYTES = $( wc -c < " ${ tmp_batch_input_file } " ) DISPLAY_NAME = BatchInput tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "https://generativelanguage.googleapis.com/upload/v1beta/files \ -D " ${ tmp_header_file } " \ -H " x-goog-api-key: $GEMINI_API_KEY " \ -H " X-Goog-Upload-Protocol: resumable " \ -H " X-Goog-Upload-Command: start " \ -H " X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H " X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H " Content-Type: application/jsonl " \ -d " { 'file' : { 'display_name' : '${DISPLAY_NAME}' }} " 2> /dev/null upload_url= $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H " Content-Length: ${ NUM_BYTES } " \ -H " X-Goog-Upload-Offset: 0 " \ -H " X-Goog-Upload-Command: upload, finalize " \ --data-binary " @ ${ tmp_batch_input_file } " 2> /dev/null > file_info.json file_uri= $( jq ".file.uri" file_info.json ) The following example calls the BatchGenerateContent method with the input file uploaded using File API: Python # Assumes `uploaded_file` is the file object from the previous step file_batch_job = client . batches . create ( model = "gemini-2.5-flash" , src = uploaded_file . name , config = { 'display_name' : "file-upload-job-1" , }, ) print ( f "Created batch job: { file_batch_job . name } " ) REST BATCH_INPUT_FILE = 'files/123456' # File ID curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \ -X POST \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" \ -d "{ 'batch': { 'display_name': 'my-batch-requests', 'input_config': { 'requests': { 'file_name': ${ BATCH_INPUT_FILE } } } } }" When you create a batch job, you will get a job name returned. Use this name for monitoring the job status as well as retrieving the results once the job completes. The following is an example output that contains a job name: Created batch job from file: batches/123456789 Monitoring job status Use the operation name obtained when creating the batch job to poll its status. The state field of the batch job will indicate its current status. A batch job can be in one of the following states: JOB_STATE_PENDING : The job has been created and is waiting to be processed by the service. JOB_STATE_SUCCEEDED : The job completed successfully. You can now retrieve the results. JOB_STATE_FAILED : The job failed. Check the error details for more information. JOB_STATE_CANCELLED : The job was cancelled by the user. You can poll the job status periodically to check for completion. Python # Use the name of the job you want to check # e.g., inline_batch_job.name from the previous step job_name = "YOUR_BATCH_JOB_NAME" # (e.g. 'batches/your-batch-id') batch_job = client . batches . get ( name = job_name ) completed_states = set ([ 'JOB_STATE_SUCCEEDED' , 'JOB_STATE_FAILED' , 'JOB_STATE_CANCELLED' , ]) print ( f "Polling status for job: { job_name } " ) batch_job = client . batches . get ( name = job_name ) # Initial get while batch_job . state . name not in completed_states : print ( f "Current state: { batch_job . state . name } " ) time . sleep ( 30 ) # Wait for 30 seconds before polling again batch_job = client . batches . get ( name = job_name ) print ( f "Job finished with state: { batch_job . state . name } " ) if batch_job . state . name == 'JOB_STATE_FAILED' : print ( f "Error: { batch_job . error } " ) Retrieving results Once the job status indicates your batch job has succeeded, the results are available in the response field. Python import json # Use the name of the job you want to check # e.g., inline_batch_job.name from the previous step job_name = "YOUR_BATCH_JOB_NAME" batch_job = client . batches . get ( name = job_name ) if batch_job . state . name == 'JOB_STATE_SUCCEEDED' : # If batch job was created with a file if batch_job . dest and batch_job . dest . file_name : # Results are in a file result_file_name = batch_job . dest . file_name print ( f "Results are in file: { result_file_name } " ) print ( "Downloading result file content..." ) file_content = client . files . download ( file = result_file_name ) # Process file_content (bytes) as needed print ( file_content . decode ( 'utf-8' )) # If batch job was created with inline request elif batch_job . dest and batch_job . dest . inlined_responses : # Results are inline print ( "Results are inline:" ) for i , inline_response in enumerate ( batch_job . dest . inlined_responses ): print ( f "Response { i + 1 } :" ) if inline_response . response : # Accessing response, structure may vary. try : print ( inline_response . response . text ) except AttributeError : print ( inline_response . response ) # Fallback elif inline_response . error : print ( f "Error: { inline_response . error } " ) else : print ( "No results found (neither file nor inline)." ) else : print ( f "Job did not succeed. Final state: { batch_job . state . name } " ) if batch_job . error : print ( f "Error: { batch_job . error } " ) REST BATCH_NAME = "batches/123456" # Your batch job name curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" 2 > /dev/null > batch_status.json if jq -r '.done' batch_status.json | grep -q "false" ; then echo "Batch has not finished processing" fi batch_state = $( jq -r '.metadata.state' batch_status.json ) if [[ $batch_state = "JOB_STATE_SUCCEEDED" ]] ; then if [[ $( jq '.response | has("inlinedResponses")' batch_status.json ) = "true" ]] ; then jq -r '.response.inlinedResponses' batch_status.json exit fi responses_file_name = $( jq -r '.response.responsesFile' batch_status.json ) curl https://generativelanguage.googleapis.com/download/v1beta/ $responses_file_name :download?alt = media \ -H "x-goog-api-key: $GEMINI_API_KEY " 2 > /dev/null elif [[ $batch_state = "JOB_STATE_FAILED" ]] ; then jq '.error' batch_status.json elif [[ $batch_state == "JOB_STATE_CANCELLED" ]] ; then echo "Batch was cancelled by the user" fi Cancelling a batch job You can cancel an ongoing batch job using its name. When a job is canceled, it stops processing new requests. Python # Cancel a batch job client . batches . cancel ( name = batch_job_to_cancel . name ) REST BATCH_NAME = "batches/123456" # Your batch job name # Cancel the batch curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME :cancel \ -H "x-goog-api-key: $GEMINI_API_KEY " \ # Confirm that the status of the batch after cancellation is JOB_STATE_CANCELLED curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" 2 > /dev/null | jq -r '.metadata.state' Deleting a batch job You can delete an existing batch job using its name. When a job is deleted, it stops processing new requests and is removed from the list of batch jobs. Python # Delete a batch job client . batches . delete ( name = batch_job_to_delete . name ) REST BATCH_NAME = "batches/123456" # Your batch job name # Cancel the batch curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME :delete \ -H "x-goog-api-key: $GEMINI_API_KEY " \ Technical details Supported models: Batch Mode supports a range of Gemini models. Refer to the Models page for the latest list of compatible models. The supported modalities for Batch Mode are the same as what's supported on the interactive (or non-batch mode) API. Pricing: Batch Mode usage is priced at 50% of the standard interactive API cost for the equivalent model. Service Level Objective (SLO): Batch jobs are designed to complete within a 24-hour turnaround time. Many jobs may complete much faster depending on their size and current system load. Caching: Context caching is enabled for batch requests. If a request in your batch results in a cache hit, the cached tokens are priced the same as for non-batch mode traffic. Best practices Use input files for large requests: For a large number of requests, always use the file input method for better manageability and to avoid hitting request size limits for the BatchGenerateContent call itself. Note that there's a the 2GB file size limit per input file. Error handling: Check the batchStats for failedRequestCount after a job completes. If using file output, parse each line to check if it's a GenerateContentResponse or a status object indicating an error for that specific request. Submit jobs once: The creation of a batch job is not idempotent. If you send the same creation request twice, two separate batch jobs will be created. Break up very large batches: While the target turnaround time is 24 hours, actual processing time can vary based on system load and job size. For large jobs, consider breaking them into smaller batches if intermediate results are needed sooner. What's next Check out the batch mode notebook for more examples. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-07 UTC.
 
1
  URL: https://ai.google.dev/gemini-api/docs/batch-mode#batch-job-status
2
+ Title: Batch API  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Batch API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Batch API The Gemini Batch API is designed to process large volumes of requests asynchronously at 50% of the standard cost . The target turnaround time is 24 hours, but in majority of cases, it is much quicker. Use Batch API for large-scale, non-urgent tasks such as data pre-processing or running evaluations where an immediate response is not required. Creating a batch job You have two ways to submit your requests in Batch API: Inline Requests : A list of GenerateContentRequest objects directly included in your batch creation request. This is suitable for smaller batches that keep the total request size under 20MB. The output returned from the model is a list of inlineResponse objects. Input File : A JSON Lines (JSONL) file where each line contains a complete GenerateContentRequest object. This method is recommended for larger requests. The output returned from the model is a JSONL file where each line is either a GenerateContentResponse or a status object. Inline requests For a small number of requests, you can directly embed the GenerateContentRequest objects within your BatchGenerateContentRequest . The following example calls the BatchGenerateContent method with inline requests: Python from google import genai from google.genai import types client = genai . Client () # A list of dictionaries, where each is a GenerateContentRequest inline_requests = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Tell me a one-sentence joke.' }], 'role' : 'user' }] }, { 'contents' : [{ 'parts' : [{ 'text' : 'Why is the sky blue?' }], 'role' : 'user' }] } ] inline_batch_job = client . batches . create ( model = "models/gemini-2.5-flash" , src = inline_requests , config = { 'display_name' : "inlined-requests-job-1" , }, ) print ( f "Created batch job: { inline_batch_job . name } " ) JavaScript import { GoogleGenAI } from '@google/genai' ; const GEMINI_API_KEY = process . env . GEMINI_API_KEY ; const ai = new GoogleGenAI ({ apiKey : GEMINI_API_KEY }); const inlinedRequests = [ { contents : [{ parts : [{ text : 'Tell me a one-sentence joke.' }], role : 'user' }] }, { contents : [{ parts : [{ 'text' : 'Why is the sky blue?' }], role : 'user' }] } ] const response = await ai . batches . create ({ model : 'gemini-2.5-flash' , src : inlinedRequests , config : { displayName : 'inlined-requests-job-1' , } }); console . log ( response ); REST curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -X POST \ -H "Content-Type:application/json" \ -d '{ "batch": { "display_name": "my-batch-requests", "input_config": { "requests": { "requests": [ { "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]}, "metadata": { "key": "request-1" } }, { "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]}, "metadata": { "key": "request-2" } } ] } } } }' Input file For larger sets of requests, prepare a JSON Lines (JSONL) file. Each line in this file must be a JSON object containing a user-defined key and a request object, where the request is a valid GenerateContentRequest object. The user-defined key is used in the response to indicate which output is the result of which request. For example, the request with the key defined as request-1 will have its response annotated with the same key name. This file is uploaded using the File API . The maximum allowed file size for an input file is 2GB. The following is an example of a JSONL file. You can save it in a file named my-batch-requests.json : { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }]}], "generation_config" : { "temperature" : 0.7 }}} { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }]}]}} Similarly to inline requests, you can specify other parameters like system instructions, tools or other configurations in each request JSON. You can upload this file using the File API as shown in the following example. If you are working with multimodal input, you can reference other uploaded files within your JSONL file. Python from google import genai from google.genai import types client = genai . Client () # Create a sample JSONL file with open ( "my-batch-requests.jsonl" , "w" ) as f : requests = [ { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }]}]}}, { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }]}]}} ] for req in requests : f . write ( json . dumps ( req ) + " \n " ) # Upload the file to the File API uploaded_file = client . files . upload ( file = 'my-batch-requests.jsonl' , config = types . UploadFileConfig ( display_name = 'my-batch-requests' , mime_type = 'jsonl' ) ) print ( f "Uploaded file: { uploaded_file . name } " ) JavaScript import { GoogleGenAI } from '@google/genai' ; import * as fs from "fs" ; import * as path from "path" ; import { fileURLToPath } from 'url' ; const GEMINI_API_KEY = process . env . GEMINI_API_KEY ; const ai = new GoogleGenAI ({ apiKey : GEMINI_API_KEY }); const fileName = "my-batch-requests.jsonl" ; // Define the requests const requests = [ { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }] }] } }, { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }] }] } } ]; // Construct the full path to file const __filename = fileURLToPath ( import . meta . url ); const __dirname = path . dirname ( __filename ); const filePath = path . join ( __dirname , fileName ); // __dirname is the directory of the current script async function writeBatchRequestsToFile ( requests , filePath ) { try { // Use a writable stream for efficiency, especially with larger files. const writeStream = fs . createWriteStream ( filePath , { flags : 'w' }); writeStream . on ( 'error' , ( err ) = > { console . error ( `Error writing to file ${ filePath } :` , err ); }); for ( const req of requests ) { writeStream . write ( JSON . stringify ( req ) + '\n' ); } writeStream . end (); console . log ( `Successfully wrote batch requests to ${ filePath } ` ); } catch ( error ) { // This catch block is for errors that might occur before stream setup, // stream errors are handled by the 'error' event. console . error ( `An unexpected error occurred:` , error ); } } // Write to a file. writeBatchRequestsToFile ( requests , filePath ); // Upload the file to the File API. const uploadedFile = await ai . files . upload ({ file : 'my-batch-requests.jsonl' , config : { mimeType : 'jsonl' , }}); console . log ( uploadedFile . name ); REST tmp_batch_input_file = batch_input.tmp echo -e '{"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}], "generationConfig": {"temperature": 0.7}}\n{"contents": [{"parts": [{"text": "What are the main ingredients in a Margherita pizza?"}]}]}' > batch_input.tmp MIME_TYPE = $( file -b --mime-type " ${ tmp_batch_input_file } " ) NUM_BYTES = $( wc -c < " ${ tmp_batch_input_file } " ) DISPLAY_NAME = BatchInput tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -D " ${ tmp_header_file } " \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/jsonl" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ tmp_batch_input_file } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) The following example calls the BatchGenerateContent method with the input file uploaded using File API: Python # Assumes `uploaded_file` is the file object from the previous step file_batch_job = client . batches . create ( model = "gemini-2.5-flash" , src = uploaded_file . name , config = { 'display_name' : "file-upload-job-1" , }, ) print ( f "Created batch job: { file_batch_job . name } " ) JavaScript // Assumes `uploadedFile` is the file object from the previous step const fileBatchJob = await ai . batches . create ({ model : 'gemini-2.5-flash' , src : uploadedFile . name , config : { displayName : 'file-upload-job-1' , } }); console . log ( fileBatchJob ); REST # Set the File ID taken from the upload response. BATCH_INPUT_FILE = 'files/123456' curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \ -X POST \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" \ -d "{ 'batch': { 'display_name': 'my-batch-requests', 'input_config': { 'file_name': ' ${ BATCH_INPUT_FILE } ' } } }" When you create a batch job, you will get a job name returned. Use this name for monitoring the job status as well as retrieving the results once the job completes. The following is an example output that contains a job name: Created batch job from file: batches/123456789 Batch embedding support You can use the Batch API to interact with the Embeddings model for higher throughput. To create an embeddings batch job with either inline requests or input files , use the batches.create_embeddings API and specify the embeddings model. Python # Creating an embeddings batch job with an input file request: file_job = client . batches . create_embeddings ( model = "gemini-embedding-001" , src = { 'file_name' : uploaded_batch_requests . name }, config = { 'display_name' : "Input embeddings batch" }, ) # Creating an embeddings batch job with an inline request: batch_job = client . batches . create_embeddings ( model = "gemini-embedding-001" , # For a predefined list of requests `inlined_requests` src = { 'inlined_requests' : inlined_requests }, config = { 'display_name' : "Inlined embeddings batch" }, ) JavaScript // Creating an embeddings batch job with an input file request: let fileJob ; fileJob = await client . batches . createEmbeddings ({ model : 'gemini-embedding-001' , src : { fileName : uploadedBatchRequests . name }, config : { displayName : 'Input embeddings batch' }, }); console . log ( `Created batch job: ${ fileJob . name } ` ); // Creating an embeddings batch job with an inline request: let batchJob ; batchJob = await client . batches . createEmbeddings ({ model : 'gemini-embedding-001' , // For a predefined a list of requests `inlinedRequests` src : { inlinedRequests : inlinedRequests }, config : { displayName : 'Inlined embeddings batch' }, }); console . log ( `Created batch job: ${ batchJob . name } ` ); Read the Embeddings section in the Batch API cookbook for more examples. Request configuration You can include any request configurations you would use in a standard non-batch request. For example, you could specify the temperature, system instructions or even pass in other modalities. The following example shows an example inline request that contains a system instruction for one of the requests: Python inline_requests_list = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Write a short poem about a cloud.' }]}]}, { 'contents' : [{ 'parts' : [{ 'text' : 'Write a short poem about a cat.' }]}], 'system_instructions' : { 'parts' : [{ 'text' : 'You are a cat. Your name is Neko.' }]}} ] JavaScript inlineRequestsList = [ { contents : [{ parts : [{ text : 'Write a short poem about a cloud.' }]}]}, { contents : [{ parts : [{ text : 'Write a short poem about a cat.' }]}], systemInstructions : { parts : [{ text : 'You are a cat. Your name is Neko.' }]}} ] Similarly can specify tools to use for a request. The following example shows a request that enables the Google Search tool : Python inline_requests_list = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Who won the euro 1998?' }]}]}, { 'contents' : [{ 'parts' : [{ 'text' : 'Who won the euro 2025?' }]}], 'tools' : [{ 'google_search ' : {}}]} ] JavaScript inlineRequestsList = [ { contents : [{ parts : [{ text : 'Who won the euro 1998?' }]}]}, { contents : [{ parts : [{ text : 'Who won the euro 2025?' }]}], tools : [{ googleSearch : {}}]} ] You can specify structured output as well. The following example shows how to specify for your batch requests. Python from google import genai from pydantic import BaseModel , TypeAdapter class Recipe ( BaseModel ): recipe_name : str ingredients : list [ str ] client = genai . Client () # A list of dictionaries, where each is a GenerateContentRequest inline_requests = [ { 'contents' : [{ 'parts' : [{ 'text' : 'List a few popular cookie recipes, and include the amounts of ingredients.' }], 'role' : 'user' }], 'config' : { 'response_mime_type' : 'application/json' , 'response_schema' : list [ Recipe ] } }, { 'contents' : [{ 'parts' : [{ 'text' : 'List a few popular gluten free cookie recipes, and include the amounts of ingredients.' }], 'role' : 'user' }], 'config' : { 'response_mime_type' : 'application/json' , 'response_schema' : list [ Recipe ] } } ] inline_batch_job = client . batches . create ( model = "models/gemini-2.5-flash" , src = inline_requests , config = { 'display_name' : "structured-output-job-1" }, ) # wait for the job to finish job_name = inline_batch_job . name print ( f "Polling status for job: { job_name } " ) while True : batch_job_inline = client . batches . get ( name = job_name ) if batch_job_inline . state . name in ( 'JOB_STATE_SUCCEEDED' , 'JOB_STATE_FAILED' , 'JOB_STATE_CANCELLED' , 'JOB_STATE_EXPIRED' ): break print ( f "Job not finished. Current state: { batch_job_inline . state . name } . Waiting 30 seconds..." ) time . sleep ( 30 ) print ( f "Job finished with state: { batch_job_inline . state . name } " ) # print the response for i , inline_response in enumerate ( batch_job_inline . dest . inlined_responses , start = 1 ): print ( f " \n --- Response { i } ---" ) # Check for a successful response if inline_response . response : # The .text property is a shortcut to the generated text. print ( inline_response . response . text ) JavaScript import { GoogleGenAI , Type } from '@google/genai' ; const GEMINI_API_KEY = process . env . GEMINI_API_KEY ; const ai = new GoogleGenAI ({ apiKey : GEMINI_API_KEY }); const inlinedRequests = [ { contents : [{ parts : [{ text : 'List a few popular cookie recipes, and include the amounts of ingredients.' }], role : 'user' }], config : { responseMimeType : 'application/json' , responseSchema : { type : Type . ARRAY , items : { type : Type . OBJECT , properties : { 'recipeName' : { type : Type . STRING , description : 'Name of the recipe' , nullable : false , }, 'ingredients' : { type : Type . ARRAY , items : { type : Type . STRING , description : 'Ingredients of the recipe' , nullable : false , }, }, }, required : [ 'recipeName' ], }, }, } }, { contents : [{ parts : [{ text : 'List a few popular gluten free cookie recipes, and include the amounts of ingredients.' }], role : 'user' }], config : { responseMimeType : 'application/json' , responseSchema : { type : Type . ARRAY , items : { type : Type . OBJECT , properties : { 'recipeName' : { type : Type . STRING , description : 'Name of the recipe' , nullable : false , }, 'ingredients' : { type : Type . ARRAY , items : { type : Type . STRING , description : 'Ingredients of the recipe' , nullable : false , }, }, }, required : [ 'recipeName' ], }, }, } } ] const inlinedBatchJob = await ai . batches . create ({ model : 'gemini-2.5-flash' , src : inlinedRequests , config : { displayName : 'inlined-requests-job-1' , } }); Monitoring job status Use the operation name obtained when creating the batch job to poll its status. The state field of the batch job will indicate its current status. A batch job can be in one of the following states: JOB_STATE_PENDING : The job has been created and is waiting to be processed by the service. JOB_STATE_RUNNING : The job is in progress. JOB_STATE_SUCCEEDED : The job completed successfully. You can now retrieve the results. JOB_STATE_FAILED : The job failed. Check the error details for more information. JOB_STATE_CANCELLED : The job was cancelled by the user. JOB_STATE_EXPIRED : The job has expired because it was running or pending for more than 48 hours. The job will not have any results to retrieve. You can try submitting the job again or splitting up the requests into smaller batches. You can poll the job status periodically to check for completion. Python # Use the name of the job you want to check # e.g., inline_batch_job.name from the previous step job_name = "YOUR_BATCH_JOB_NAME" # (e.g. 'batches/your-batch-id') batch_job = client . batches . get ( name = job_name ) completed_states = set ([ 'JOB_STATE_SUCCEEDED' , 'JOB_STATE_FAILED' , 'JOB_STATE_CANCELLED' , 'JOB_STATE_EXPIRED' , ]) print ( f "Polling status for job: { job_name } " ) batch_job = client . batches . get ( name = job_name ) # Initial get while batch_job . state . name not in completed_states : print ( f "Current state: { batch_job . state . name } " ) time . sleep ( 30 ) # Wait for 30 seconds before polling again batch_job = client . batches . get ( name = job_name ) print ( f "Job finished with state: { batch_job . state . name } " ) if batch_job . state . name == 'JOB_STATE_FAILED' : print ( f "Error: { batch_job . error } " ) JavaScript // Use the name of the job you want to check // e.g., inlinedBatchJob.name from the previous step let batchJob ; const completedStates = new Set ([ 'JOB_STATE_SUCCEEDED' , 'JOB_STATE_FAILED' , 'JOB_STATE_CANCELLED' , 'JOB_STATE_EXPIRED' , ]); try { batchJob = await ai . batches . get ({ name : inlinedBatchJob . name }); while ( ! completedStates . has ( batchJob . state )) { console . log ( `Current state: ${ batchJob . state } ` ); // Wait for 30 seconds before polling again await new Promise ( resolve = > setTimeout ( resolve , 30000 )); batchJob = await client . batches . get ({ name : batchJob . name }); } console . log ( `Job finished with state: ${ batchJob . state } ` ); if ( batchJob . state === 'JOB_STATE_FAILED' ) { // The exact structure of `error` might vary depending on the SDK // This assumes `error` is an object with a `message` property. console . error ( `Error: ${ batchJob . state } ` ); } } catch ( error ) { console . error ( `An error occurred while polling job ${ batchJob . name } :` , error ); } Retrieving results Once the job status indicates your batch job has succeeded, the results are available in the response field. Python import json # Use the name of the job you want to check # e.g., inline_batch_job.name from the previous step job_name = "YOUR_BATCH_JOB_NAME" batch_job = client . batches . get ( name = job_name ) if batch_job . state . name == 'JOB_STATE_SUCCEEDED' : # If batch job was created with a file if batch_job . dest and batch_job . dest . file_name : # Results are in a file result_file_name = batch_job . dest . file_name print ( f "Results are in file: { result_file_name } " ) print ( "Downloading result file content..." ) file_content = client . files . download ( file = result_file_name ) # Process file_content (bytes) as needed print ( file_content . decode ( 'utf-8' )) # If batch job was created with inline request # (for embeddings, use batch_job.dest.inlined_embed_content_responses) elif batch_job . dest and batch_job . dest . inlined_responses : # Results are inline print ( "Results are inline:" ) for i , inline_response in enumerate ( batch_job . dest . inlined_responses ): print ( f "Response { i + 1 } :" ) if inline_response . response : # Accessing response, structure may vary. try : print ( inline_response . response . text ) except AttributeError : print ( inline_response . response ) # Fallback elif inline_response . error : print ( f "Error: { inline_response . error } " ) else : print ( "No results found (neither file nor inline)." ) else : print ( f "Job did not succeed. Final state: { batch_job . state . name } " ) if batch_job . error : print ( f "Error: { batch_job . error } " ) JavaScript // Use the name of the job you want to check // e.g., inlinedBatchJob.name from the previous step const jobName = "YOUR_BATCH_JOB_NAME" ; try { const batchJob = await ai . batches . get ({ name : jobName }); if ( batchJob . state === 'JOB_STATE_SUCCEEDED' ) { console . log ( 'Found completed batch:' , batchJob . displayName ); console . log ( batchJob ); // If batch job was created with a file destination if ( batchJob . dest ? . fileName ) { const resultFileName = batchJob . dest . fileName ; console . log ( `Results are in file: ${ resultFileName } ` ); console . log ( "Downloading result file content..." ); const fileContentBuffer = await ai . files . download ({ file : resultFileName }); // Process fileContentBuffer (Buffer) as needed console . log ( fileContentBuffer . toString ( 'utf-8' )); } // If batch job was created with inline responses else if ( batchJob . dest ? . inlinedResponses ) { console . log ( "Results are inline:" ); for ( let i = 0 ; i < batchJob . dest . inlinedResponses . length ; i ++ ) { const inlineResponse = batchJob . dest . inlinedResponses [ i ]; console . log ( `Response ${ i + 1 } :` ); if ( inlineResponse . response ) { // Accessing response, structure may vary. if ( inlineResponse . response . text !== undefined ) { console . log ( inlineResponse . response . text ); } else { console . log ( inlineResponse . response ); // Fallback } } else if ( inlineResponse . error ) { console . error ( `Error: ${ inlineResponse . error } ` ); } } } // If batch job was an embedding batch with inline responses else if ( batchJob . dest ? . inlinedEmbedContentResponses ) { console . log ( "Embedding results found inline:" ); for ( let i = 0 ; i < batchJob . dest . inlinedEmbedContentResponses . length ; i ++ ) { const inlineResponse = batchJob . dest . inlinedEmbedContentResponses [ i ]; console . log ( `Response ${ i + 1 } :` ); if ( inlineResponse . response ) { console . log ( inlineResponse . response ); } else if ( inlineResponse . error ) { console . error ( `Error: ${ inlineResponse . error } ` ); } } } else { console . log ( "No results found (neither file nor inline)." ); } } else { console . log ( `Job did not succeed. Final state: ${ batchJob . state } ` ); if ( batchJob . error ) { console . error ( `Error: ${ typeof batchJob . error === 'string' ? batchJob . error : batchJob . error . message || JSON . stringify ( batchJob . error ) } ` ); } } } catch ( error ) { console . error ( `An error occurred while processing job ${ jobName } :` , error ); } REST BATCH_NAME = "batches/123456" # Your batch job name curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" 2 > /dev/null > batch_status.json if jq -r '.done' batch_status.json | grep -q "false" ; then echo "Batch has not finished processing" fi batch_state = $( jq -r '.metadata.state' batch_status.json ) if [[ $batch_state = "JOB_STATE_SUCCEEDED" ]] ; then if [[ $( jq '.response | has("inlinedResponses")' batch_status.json ) = "true" ]] ; then jq -r '.response.inlinedResponses' batch_status.json exit fi responses_file_name = $( jq -r '.response.responsesFile' batch_status.json ) curl https://generativelanguage.googleapis.com/download/v1beta/ $responses_file_name :download?alt = media \ -H "x-goog-api-key: $GEMINI_API_KEY " 2 > /dev/null elif [[ $batch_state = "JOB_STATE_FAILED" ]] ; then jq '.error' batch_status.json elif [[ $batch_state == "JOB_STATE_CANCELLED" ]] ; then echo "Batch was cancelled by the user" elif [[ $batch_state == "JOB_STATE_EXPIRED" ]] ; then echo "Batch expired after 48 hours" fi Cancelling a batch job You can cancel an ongoing batch job using its name. When a job is canceled, it stops processing new requests. Python # Cancel a batch job client . batches . cancel ( name = batch_job_to_cancel . name ) JavaScript await ai . batches . cancel ({ name : batchJobToCancel . name }); REST BATCH_NAME = "batches/123456" # Your batch job name # Cancel the batch curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME :cancel \ -H "x-goog-api-key: $GEMINI_API_KEY " \ # Confirm that the status of the batch after cancellation is JOB_STATE_CANCELLED curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" 2 > /dev/null | jq -r '.metadata.state' Deleting a batch job You can delete an existing batch job using its name. When a job is deleted, it stops processing new requests and is removed from the list of batch jobs. Python # Delete a batch job client . batches . delete ( name = batch_job_to_delete . name ) JavaScript await ai . batches . delete ({ name : batchJobToDelete . name }); REST BATCH_NAME = "batches/123456" # Your batch job name # Delete the batch job curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME :delete \ -H "x-goog-api-key: $GEMINI_API_KEY " Technical details Supported models: Batch API supports a range of Gemini models. Refer to the Models page for each model's support of Batch API. The supported modalities for Batch API are the same as what's supported on the interactive (or non-batch) API. Pricing: Batch API usage is priced at 50% of the standard interactive API cost for the equivalent model. See the pricing page for details. Refer to the rate limits page for details on rate limits for this feature. Service Level Objective (SLO): Batch jobs are designed to complete within a 24-hour turnaround time. Many jobs may complete much faster depending on their size and current system load. Caching: Context caching is enabled for batch requests. If a request in your batch results in a cache hit, the cached tokens are priced the same as for non-batch API traffic. Best practices Use input files for large requests: For a large number of requests, always use the file input method for better manageability and to avoid hitting request size limits for the BatchGenerateContent call itself. Note that there's a the 2GB file size limit per input file. Error handling: Check the batchStats for failedRequestCount after a job completes. If using file output, parse each line to check if it's a GenerateContentResponse or a status object indicating an error for that specific request. See the troubleshooting guide for a complete set of error codes. Submit jobs once: The creation of a batch job is not idempotent. If you send the same creation request twice, two separate batch jobs will be created. Break up very large batches: While the target turnaround time is 24 hours, actual processing time can vary based on system load and job size. For large jobs, consider breaking them into smaller batches if intermediate results are needed sooner. What's next Check out the Batch API notebook for more examples. The OpenAI compatibility layer supports Batch API. Read the examples on the OpenAI Compatibility page. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-09-12 UTC.
text_content/docs_batch-mode_d49d3e09.txt CHANGED
@@ -1,5 +1,5 @@
1
  URL: https://ai.google.dev/gemini-api/docs/batch-mode#inline-requests
2
- Title: Batch Mode  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Batch Mode | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Batch Mode The Gemini API's Batch Mode is designed to process large volumes of requests asynchronously at 50% of the standard cost . The target turnaround time is 24 hours, but in majority of cases, it is much quicker. Use Batch Mode for large-scale, non-urgent tasks such as data pre-processing or running evaluations where an immediate response is not required. Note: You can use Batch Mode with the Gemini API Python SDK or the REST API. Support for Batch Mode in the Gemini API JavaScript SDK is coming soon. Getting Started This section helps you get started with submitting your first requests in batch mode. Creating a batch job You have two ways to submit your requests in Batch Mode: Inline Requests : A list of GenerateContentRequest objects directly included in your batch creation request. This is suitable for smaller batches that keep the total request size under 20MB. The output returned from the model is a list of inlineResponse objects. Input File : A JSON Lines (JSONL) file where each line contains a complete GenerateContentRequest object. This method is recommended for larger requests. The output returned from the model is a JSONL file where each line is either a GenerateContentResponse or a status object. Inline requests For a small number of requests, you can directly embed the GenerateContentRequest objects within your BatchGenerateContentRequest . The following example calls the BatchGenerateContent method with inline requests: Python from google import genai from google.genai import types client = genai . Client () # A list of dictionaries, where each is a GenerateContentRequest inline_requests = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Tell me a one-sentence joke.' }], 'role' : 'user' }] }, { 'contents' : [{ 'parts' : [{ 'text' : 'Why is the sky blue?' }], 'role' : 'user' }] } ] inline_batch_job = client . batches . create ( model = "models/gemini-2.5-flash" , src = inline_requests , config = { 'display_name' : "inlined-requests-job-1" , }, ) print ( f "Created batch job: { inline_batch_job . name } " ) REST curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -X POST \ -H "Content-Type:application/json" \ -d '{ "batch": { "display_name": "my-batch-requests", "input_config": { "requests": { "requests": [ { "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]}, "metadata": { "key": "request-1" } }, { "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]}, "metadata": { "key": "request-2" } } ] } } } }' You can use any requests you would use in non-batch (or interactive) mode. For example, you could specify the temperature, system instructions or even pass in other modalities. The following example shows some example inline requests that contain a system instruction for one of the requests: inline_requests_list = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Write a short poem about a cloud.' }]}]}, { 'contents' : [{ 'parts' : [{ 'text' : 'Write a short poem about a cat.' }]}], 'system_instructions' : { 'parts' : [{ 'text' : 'You are a cat. Your name is Neko.' }]}} ] Similarly can also specify tools to use for a request. The following example shows a request that enables the Google Search tool : inline_requests_list = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Who won the euro 1998?' }]}]}, { 'contents' : [{ 'parts' : [{ 'text' : 'Who won the euro 2025?' }]}], 'tools' : [{ 'google_search ' : {}}]} ] Input file For larger sets of requests, prepare a JSON Lines (JSONL) file. Each line in this file must be a JSON object containing a user-defined key and a request object, where the request is a valid GenerateContentRequest object. The user-defined key is used in the response to indicate which output is the result of which request. For example, the request with the key defined as request-1 will have its response annotated with the same key name. This file is uploaded using the File API . The maximum allowed file size for an input file is 2GB. The following is an example of a JSONL file. You can save it in a file named my-batch-requests.json : { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }]}], "generation_config" : { "temperature" : 0.7 }}} { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }]}]}} Similarly to inline requests, you can specify other parameters like system instructions, tools or other configurations in each request JSON. You can upload this file using the File API as shown in the following example. If you are working with multimodal input, you can reference other uploaded files within your JSONL file. Python from google import genai from google.genai import types client = genai . Client () # Create a sample JSONL file with open ( "my-batch-requests.jsonl" , "w" ) as f : requests = [ { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }]}]}}, { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }]}]}} ] for req in requests : f . write ( json . dumps ( req ) + " \n " ) # Upload the file to the File API uploaded_file = client . files . upload ( file = 'my-batch-requests.jsonl' , config = types . UploadFileConfig ( display_name = 'my-batch-requests' , mime_type = 'jsonl' ) ) print ( f "Uploaded file: { uploaded_file . name } " ) REST tmp_batch_input_file = batch_input.tmp echo -e '{"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}], "generationConfig": {"temperature": 0.7}}\n{"contents": [{"parts": [{"text": "What are the main ingredients in a Margherita pizza?"}]}]}' > batch_input.tmp MIME_TYPE = $( file -b --mime-type " ${ tmp_batch_input_file } " ) NUM_BYTES = $( wc -c < " ${ tmp_batch_input_file } " ) DISPLAY_NAME = BatchInput tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "https://generativelanguage.googleapis.com/upload/v1beta/files \ -D " ${ tmp_header_file } " \ -H " x-goog-api-key: $GEMINI_API_KEY " \ -H " X-Goog-Upload-Protocol: resumable " \ -H " X-Goog-Upload-Command: start " \ -H " X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H " X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H " Content-Type: application/jsonl " \ -d " { 'file' : { 'display_name' : '${DISPLAY_NAME}' }} " 2> /dev/null upload_url= $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H " Content-Length: ${ NUM_BYTES } " \ -H " X-Goog-Upload-Offset: 0 " \ -H " X-Goog-Upload-Command: upload, finalize " \ --data-binary " @ ${ tmp_batch_input_file } " 2> /dev/null > file_info.json file_uri= $( jq ".file.uri" file_info.json ) The following example calls the BatchGenerateContent method with the input file uploaded using File API: Python # Assumes `uploaded_file` is the file object from the previous step file_batch_job = client . batches . create ( model = "gemini-2.5-flash" , src = uploaded_file . name , config = { 'display_name' : "file-upload-job-1" , }, ) print ( f "Created batch job: { file_batch_job . name } " ) REST BATCH_INPUT_FILE = 'files/123456' # File ID curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \ -X POST \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" \ -d "{ 'batch': { 'display_name': 'my-batch-requests', 'input_config': { 'requests': { 'file_name': ${ BATCH_INPUT_FILE } } } } }" When you create a batch job, you will get a job name returned. Use this name for monitoring the job status as well as retrieving the results once the job completes. The following is an example output that contains a job name: Created batch job from file: batches/123456789 Monitoring job status Use the operation name obtained when creating the batch job to poll its status. The state field of the batch job will indicate its current status. A batch job can be in one of the following states: JOB_STATE_PENDING : The job has been created and is waiting to be processed by the service. JOB_STATE_SUCCEEDED : The job completed successfully. You can now retrieve the results. JOB_STATE_FAILED : The job failed. Check the error details for more information. JOB_STATE_CANCELLED : The job was cancelled by the user. You can poll the job status periodically to check for completion. Python # Use the name of the job you want to check # e.g., inline_batch_job.name from the previous step job_name = "YOUR_BATCH_JOB_NAME" # (e.g. 'batches/your-batch-id') batch_job = client . batches . get ( name = job_name ) completed_states = set ([ 'JOB_STATE_SUCCEEDED' , 'JOB_STATE_FAILED' , 'JOB_STATE_CANCELLED' , ]) print ( f "Polling status for job: { job_name } " ) batch_job = client . batches . get ( name = job_name ) # Initial get while batch_job . state . name not in completed_states : print ( f "Current state: { batch_job . state . name } " ) time . sleep ( 30 ) # Wait for 30 seconds before polling again batch_job = client . batches . get ( name = job_name ) print ( f "Job finished with state: { batch_job . state . name } " ) if batch_job . state . name == 'JOB_STATE_FAILED' : print ( f "Error: { batch_job . error } " ) Retrieving results Once the job status indicates your batch job has succeeded, the results are available in the response field. Python import json # Use the name of the job you want to check # e.g., inline_batch_job.name from the previous step job_name = "YOUR_BATCH_JOB_NAME" batch_job = client . batches . get ( name = job_name ) if batch_job . state . name == 'JOB_STATE_SUCCEEDED' : # If batch job was created with a file if batch_job . dest and batch_job . dest . file_name : # Results are in a file result_file_name = batch_job . dest . file_name print ( f "Results are in file: { result_file_name } " ) print ( "Downloading result file content..." ) file_content = client . files . download ( file = result_file_name ) # Process file_content (bytes) as needed print ( file_content . decode ( 'utf-8' )) # If batch job was created with inline request elif batch_job . dest and batch_job . dest . inlined_responses : # Results are inline print ( "Results are inline:" ) for i , inline_response in enumerate ( batch_job . dest . inlined_responses ): print ( f "Response { i + 1 } :" ) if inline_response . response : # Accessing response, structure may vary. try : print ( inline_response . response . text ) except AttributeError : print ( inline_response . response ) # Fallback elif inline_response . error : print ( f "Error: { inline_response . error } " ) else : print ( "No results found (neither file nor inline)." ) else : print ( f "Job did not succeed. Final state: { batch_job . state . name } " ) if batch_job . error : print ( f "Error: { batch_job . error } " ) REST BATCH_NAME = "batches/123456" # Your batch job name curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" 2 > /dev/null > batch_status.json if jq -r '.done' batch_status.json | grep -q "false" ; then echo "Batch has not finished processing" fi batch_state = $( jq -r '.metadata.state' batch_status.json ) if [[ $batch_state = "JOB_STATE_SUCCEEDED" ]] ; then if [[ $( jq '.response | has("inlinedResponses")' batch_status.json ) = "true" ]] ; then jq -r '.response.inlinedResponses' batch_status.json exit fi responses_file_name = $( jq -r '.response.responsesFile' batch_status.json ) curl https://generativelanguage.googleapis.com/download/v1beta/ $responses_file_name :download?alt = media \ -H "x-goog-api-key: $GEMINI_API_KEY " 2 > /dev/null elif [[ $batch_state = "JOB_STATE_FAILED" ]] ; then jq '.error' batch_status.json elif [[ $batch_state == "JOB_STATE_CANCELLED" ]] ; then echo "Batch was cancelled by the user" fi Cancelling a batch job You can cancel an ongoing batch job using its name. When a job is canceled, it stops processing new requests. Python # Cancel a batch job client . batches . cancel ( name = batch_job_to_cancel . name ) REST BATCH_NAME = "batches/123456" # Your batch job name # Cancel the batch curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME :cancel \ -H "x-goog-api-key: $GEMINI_API_KEY " \ # Confirm that the status of the batch after cancellation is JOB_STATE_CANCELLED curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" 2 > /dev/null | jq -r '.metadata.state' Deleting a batch job You can delete an existing batch job using its name. When a job is deleted, it stops processing new requests and is removed from the list of batch jobs. Python # Delete a batch job client . batches . delete ( name = batch_job_to_delete . name ) REST BATCH_NAME = "batches/123456" # Your batch job name # Cancel the batch curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME :delete \ -H "x-goog-api-key: $GEMINI_API_KEY " \ Technical details Supported models: Batch Mode supports a range of Gemini models. Refer to the Models page for the latest list of compatible models. The supported modalities for Batch Mode are the same as what's supported on the interactive (or non-batch mode) API. Pricing: Batch Mode usage is priced at 50% of the standard interactive API cost for the equivalent model. Service Level Objective (SLO): Batch jobs are designed to complete within a 24-hour turnaround time. Many jobs may complete much faster depending on their size and current system load. Caching: Context caching is enabled for batch requests. If a request in your batch results in a cache hit, the cached tokens are priced the same as for non-batch mode traffic. Best practices Use input files for large requests: For a large number of requests, always use the file input method for better manageability and to avoid hitting request size limits for the BatchGenerateContent call itself. Note that there's a the 2GB file size limit per input file. Error handling: Check the batchStats for failedRequestCount after a job completes. If using file output, parse each line to check if it's a GenerateContentResponse or a status object indicating an error for that specific request. Submit jobs once: The creation of a batch job is not idempotent. If you send the same creation request twice, two separate batch jobs will be created. Break up very large batches: While the target turnaround time is 24 hours, actual processing time can vary based on system load and job size. For large jobs, consider breaking them into smaller batches if intermediate results are needed sooner. What's next Check out the batch mode notebook for more examples. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-07 UTC.
 
1
  URL: https://ai.google.dev/gemini-api/docs/batch-mode#inline-requests
2
+ Title: Batch API  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Batch API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Batch API The Gemini Batch API is designed to process large volumes of requests asynchronously at 50% of the standard cost . The target turnaround time is 24 hours, but in majority of cases, it is much quicker. Use Batch API for large-scale, non-urgent tasks such as data pre-processing or running evaluations where an immediate response is not required. Creating a batch job You have two ways to submit your requests in Batch API: Inline Requests : A list of GenerateContentRequest objects directly included in your batch creation request. This is suitable for smaller batches that keep the total request size under 20MB. The output returned from the model is a list of inlineResponse objects. Input File : A JSON Lines (JSONL) file where each line contains a complete GenerateContentRequest object. This method is recommended for larger requests. The output returned from the model is a JSONL file where each line is either a GenerateContentResponse or a status object. Inline requests For a small number of requests, you can directly embed the GenerateContentRequest objects within your BatchGenerateContentRequest . The following example calls the BatchGenerateContent method with inline requests: Python from google import genai from google.genai import types client = genai . Client () # A list of dictionaries, where each is a GenerateContentRequest inline_requests = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Tell me a one-sentence joke.' }], 'role' : 'user' }] }, { 'contents' : [{ 'parts' : [{ 'text' : 'Why is the sky blue?' }], 'role' : 'user' }] } ] inline_batch_job = client . batches . create ( model = "models/gemini-2.5-flash" , src = inline_requests , config = { 'display_name' : "inlined-requests-job-1" , }, ) print ( f "Created batch job: { inline_batch_job . name } " ) JavaScript import { GoogleGenAI } from '@google/genai' ; const GEMINI_API_KEY = process . env . GEMINI_API_KEY ; const ai = new GoogleGenAI ({ apiKey : GEMINI_API_KEY }); const inlinedRequests = [ { contents : [{ parts : [{ text : 'Tell me a one-sentence joke.' }], role : 'user' }] }, { contents : [{ parts : [{ 'text' : 'Why is the sky blue?' }], role : 'user' }] } ] const response = await ai . batches . create ({ model : 'gemini-2.5-flash' , src : inlinedRequests , config : { displayName : 'inlined-requests-job-1' , } }); console . log ( response ); REST curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -X POST \ -H "Content-Type:application/json" \ -d '{ "batch": { "display_name": "my-batch-requests", "input_config": { "requests": { "requests": [ { "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]}, "metadata": { "key": "request-1" } }, { "request": {"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}]}, "metadata": { "key": "request-2" } } ] } } } }' Input file For larger sets of requests, prepare a JSON Lines (JSONL) file. Each line in this file must be a JSON object containing a user-defined key and a request object, where the request is a valid GenerateContentRequest object. The user-defined key is used in the response to indicate which output is the result of which request. For example, the request with the key defined as request-1 will have its response annotated with the same key name. This file is uploaded using the File API . The maximum allowed file size for an input file is 2GB. The following is an example of a JSONL file. You can save it in a file named my-batch-requests.json : { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }]}], "generation_config" : { "temperature" : 0.7 }}} { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }]}]}} Similarly to inline requests, you can specify other parameters like system instructions, tools or other configurations in each request JSON. You can upload this file using the File API as shown in the following example. If you are working with multimodal input, you can reference other uploaded files within your JSONL file. Python from google import genai from google.genai import types client = genai . Client () # Create a sample JSONL file with open ( "my-batch-requests.jsonl" , "w" ) as f : requests = [ { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }]}]}}, { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }]}]}} ] for req in requests : f . write ( json . dumps ( req ) + " \n " ) # Upload the file to the File API uploaded_file = client . files . upload ( file = 'my-batch-requests.jsonl' , config = types . UploadFileConfig ( display_name = 'my-batch-requests' , mime_type = 'jsonl' ) ) print ( f "Uploaded file: { uploaded_file . name } " ) JavaScript import { GoogleGenAI } from '@google/genai' ; import * as fs from "fs" ; import * as path from "path" ; import { fileURLToPath } from 'url' ; const GEMINI_API_KEY = process . env . GEMINI_API_KEY ; const ai = new GoogleGenAI ({ apiKey : GEMINI_API_KEY }); const fileName = "my-batch-requests.jsonl" ; // Define the requests const requests = [ { "key" : "request-1" , "request" : { "contents" : [{ "parts" : [{ "text" : "Describe the process of photosynthesis." }] }] } }, { "key" : "request-2" , "request" : { "contents" : [{ "parts" : [{ "text" : "What are the main ingredients in a Margherita pizza?" }] }] } } ]; // Construct the full path to file const __filename = fileURLToPath ( import . meta . url ); const __dirname = path . dirname ( __filename ); const filePath = path . join ( __dirname , fileName ); // __dirname is the directory of the current script async function writeBatchRequestsToFile ( requests , filePath ) { try { // Use a writable stream for efficiency, especially with larger files. const writeStream = fs . createWriteStream ( filePath , { flags : 'w' }); writeStream . on ( 'error' , ( err ) = > { console . error ( `Error writing to file ${ filePath } :` , err ); }); for ( const req of requests ) { writeStream . write ( JSON . stringify ( req ) + '\n' ); } writeStream . end (); console . log ( `Successfully wrote batch requests to ${ filePath } ` ); } catch ( error ) { // This catch block is for errors that might occur before stream setup, // stream errors are handled by the 'error' event. console . error ( `An unexpected error occurred:` , error ); } } // Write to a file. writeBatchRequestsToFile ( requests , filePath ); // Upload the file to the File API. const uploadedFile = await ai . files . upload ({ file : 'my-batch-requests.jsonl' , config : { mimeType : 'jsonl' , }}); console . log ( uploadedFile . name ); REST tmp_batch_input_file = batch_input.tmp echo -e '{"contents": [{"parts": [{"text": "Describe the process of photosynthesis."}]}], "generationConfig": {"temperature": 0.7}}\n{"contents": [{"parts": [{"text": "What are the main ingredients in a Margherita pizza?"}]}]}' > batch_input.tmp MIME_TYPE = $( file -b --mime-type " ${ tmp_batch_input_file } " ) NUM_BYTES = $( wc -c < " ${ tmp_batch_input_file } " ) DISPLAY_NAME = BatchInput tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -D " ${ tmp_header_file } " \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/jsonl" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ tmp_batch_input_file } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) The following example calls the BatchGenerateContent method with the input file uploaded using File API: Python # Assumes `uploaded_file` is the file object from the previous step file_batch_job = client . batches . create ( model = "gemini-2.5-flash" , src = uploaded_file . name , config = { 'display_name' : "file-upload-job-1" , }, ) print ( f "Created batch job: { file_batch_job . name } " ) JavaScript // Assumes `uploadedFile` is the file object from the previous step const fileBatchJob = await ai . batches . create ({ model : 'gemini-2.5-flash' , src : uploadedFile . name , config : { displayName : 'file-upload-job-1' , } }); console . log ( fileBatchJob ); REST # Set the File ID taken from the upload response. BATCH_INPUT_FILE = 'files/123456' curl https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:batchGenerateContent \ -X POST \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" \ -d "{ 'batch': { 'display_name': 'my-batch-requests', 'input_config': { 'file_name': ' ${ BATCH_INPUT_FILE } ' } } }" When you create a batch job, you will get a job name returned. Use this name for monitoring the job status as well as retrieving the results once the job completes. The following is an example output that contains a job name: Created batch job from file: batches/123456789 Batch embedding support You can use the Batch API to interact with the Embeddings model for higher throughput. To create an embeddings batch job with either inline requests or input files , use the batches.create_embeddings API and specify the embeddings model. Python # Creating an embeddings batch job with an input file request: file_job = client . batches . create_embeddings ( model = "gemini-embedding-001" , src = { 'file_name' : uploaded_batch_requests . name }, config = { 'display_name' : "Input embeddings batch" }, ) # Creating an embeddings batch job with an inline request: batch_job = client . batches . create_embeddings ( model = "gemini-embedding-001" , # For a predefined list of requests `inlined_requests` src = { 'inlined_requests' : inlined_requests }, config = { 'display_name' : "Inlined embeddings batch" }, ) JavaScript // Creating an embeddings batch job with an input file request: let fileJob ; fileJob = await client . batches . createEmbeddings ({ model : 'gemini-embedding-001' , src : { fileName : uploadedBatchRequests . name }, config : { displayName : 'Input embeddings batch' }, }); console . log ( `Created batch job: ${ fileJob . name } ` ); // Creating an embeddings batch job with an inline request: let batchJob ; batchJob = await client . batches . createEmbeddings ({ model : 'gemini-embedding-001' , // For a predefined a list of requests `inlinedRequests` src : { inlinedRequests : inlinedRequests }, config : { displayName : 'Inlined embeddings batch' }, }); console . log ( `Created batch job: ${ batchJob . name } ` ); Read the Embeddings section in the Batch API cookbook for more examples. Request configuration You can include any request configurations you would use in a standard non-batch request. For example, you could specify the temperature, system instructions or even pass in other modalities. The following example shows an example inline request that contains a system instruction for one of the requests: Python inline_requests_list = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Write a short poem about a cloud.' }]}]}, { 'contents' : [{ 'parts' : [{ 'text' : 'Write a short poem about a cat.' }]}], 'system_instructions' : { 'parts' : [{ 'text' : 'You are a cat. Your name is Neko.' }]}} ] JavaScript inlineRequestsList = [ { contents : [{ parts : [{ text : 'Write a short poem about a cloud.' }]}]}, { contents : [{ parts : [{ text : 'Write a short poem about a cat.' }]}], systemInstructions : { parts : [{ text : 'You are a cat. Your name is Neko.' }]}} ] Similarly can specify tools to use for a request. The following example shows a request that enables the Google Search tool : Python inline_requests_list = [ { 'contents' : [{ 'parts' : [{ 'text' : 'Who won the euro 1998?' }]}]}, { 'contents' : [{ 'parts' : [{ 'text' : 'Who won the euro 2025?' }]}], 'tools' : [{ 'google_search ' : {}}]} ] JavaScript inlineRequestsList = [ { contents : [{ parts : [{ text : 'Who won the euro 1998?' }]}]}, { contents : [{ parts : [{ text : 'Who won the euro 2025?' }]}], tools : [{ googleSearch : {}}]} ] You can specify structured output as well. The following example shows how to specify for your batch requests. Python from google import genai from pydantic import BaseModel , TypeAdapter class Recipe ( BaseModel ): recipe_name : str ingredients : list [ str ] client = genai . Client () # A list of dictionaries, where each is a GenerateContentRequest inline_requests = [ { 'contents' : [{ 'parts' : [{ 'text' : 'List a few popular cookie recipes, and include the amounts of ingredients.' }], 'role' : 'user' }], 'config' : { 'response_mime_type' : 'application/json' , 'response_schema' : list [ Recipe ] } }, { 'contents' : [{ 'parts' : [{ 'text' : 'List a few popular gluten free cookie recipes, and include the amounts of ingredients.' }], 'role' : 'user' }], 'config' : { 'response_mime_type' : 'application/json' , 'response_schema' : list [ Recipe ] } } ] inline_batch_job = client . batches . create ( model = "models/gemini-2.5-flash" , src = inline_requests , config = { 'display_name' : "structured-output-job-1" }, ) # wait for the job to finish job_name = inline_batch_job . name print ( f "Polling status for job: { job_name } " ) while True : batch_job_inline = client . batches . get ( name = job_name ) if batch_job_inline . state . name in ( 'JOB_STATE_SUCCEEDED' , 'JOB_STATE_FAILED' , 'JOB_STATE_CANCELLED' , 'JOB_STATE_EXPIRED' ): break print ( f "Job not finished. Current state: { batch_job_inline . state . name } . Waiting 30 seconds..." ) time . sleep ( 30 ) print ( f "Job finished with state: { batch_job_inline . state . name } " ) # print the response for i , inline_response in enumerate ( batch_job_inline . dest . inlined_responses , start = 1 ): print ( f " \n --- Response { i } ---" ) # Check for a successful response if inline_response . response : # The .text property is a shortcut to the generated text. print ( inline_response . response . text ) JavaScript import { GoogleGenAI , Type } from '@google/genai' ; const GEMINI_API_KEY = process . env . GEMINI_API_KEY ; const ai = new GoogleGenAI ({ apiKey : GEMINI_API_KEY }); const inlinedRequests = [ { contents : [{ parts : [{ text : 'List a few popular cookie recipes, and include the amounts of ingredients.' }], role : 'user' }], config : { responseMimeType : 'application/json' , responseSchema : { type : Type . ARRAY , items : { type : Type . OBJECT , properties : { 'recipeName' : { type : Type . STRING , description : 'Name of the recipe' , nullable : false , }, 'ingredients' : { type : Type . ARRAY , items : { type : Type . STRING , description : 'Ingredients of the recipe' , nullable : false , }, }, }, required : [ 'recipeName' ], }, }, } }, { contents : [{ parts : [{ text : 'List a few popular gluten free cookie recipes, and include the amounts of ingredients.' }], role : 'user' }], config : { responseMimeType : 'application/json' , responseSchema : { type : Type . ARRAY , items : { type : Type . OBJECT , properties : { 'recipeName' : { type : Type . STRING , description : 'Name of the recipe' , nullable : false , }, 'ingredients' : { type : Type . ARRAY , items : { type : Type . STRING , description : 'Ingredients of the recipe' , nullable : false , }, }, }, required : [ 'recipeName' ], }, }, } } ] const inlinedBatchJob = await ai . batches . create ({ model : 'gemini-2.5-flash' , src : inlinedRequests , config : { displayName : 'inlined-requests-job-1' , } }); Monitoring job status Use the operation name obtained when creating the batch job to poll its status. The state field of the batch job will indicate its current status. A batch job can be in one of the following states: JOB_STATE_PENDING : The job has been created and is waiting to be processed by the service. JOB_STATE_RUNNING : The job is in progress. JOB_STATE_SUCCEEDED : The job completed successfully. You can now retrieve the results. JOB_STATE_FAILED : The job failed. Check the error details for more information. JOB_STATE_CANCELLED : The job was cancelled by the user. JOB_STATE_EXPIRED : The job has expired because it was running or pending for more than 48 hours. The job will not have any results to retrieve. You can try submitting the job again or splitting up the requests into smaller batches. You can poll the job status periodically to check for completion. Python # Use the name of the job you want to check # e.g., inline_batch_job.name from the previous step job_name = "YOUR_BATCH_JOB_NAME" # (e.g. 'batches/your-batch-id') batch_job = client . batches . get ( name = job_name ) completed_states = set ([ 'JOB_STATE_SUCCEEDED' , 'JOB_STATE_FAILED' , 'JOB_STATE_CANCELLED' , 'JOB_STATE_EXPIRED' , ]) print ( f "Polling status for job: { job_name } " ) batch_job = client . batches . get ( name = job_name ) # Initial get while batch_job . state . name not in completed_states : print ( f "Current state: { batch_job . state . name } " ) time . sleep ( 30 ) # Wait for 30 seconds before polling again batch_job = client . batches . get ( name = job_name ) print ( f "Job finished with state: { batch_job . state . name } " ) if batch_job . state . name == 'JOB_STATE_FAILED' : print ( f "Error: { batch_job . error } " ) JavaScript // Use the name of the job you want to check // e.g., inlinedBatchJob.name from the previous step let batchJob ; const completedStates = new Set ([ 'JOB_STATE_SUCCEEDED' , 'JOB_STATE_FAILED' , 'JOB_STATE_CANCELLED' , 'JOB_STATE_EXPIRED' , ]); try { batchJob = await ai . batches . get ({ name : inlinedBatchJob . name }); while ( ! completedStates . has ( batchJob . state )) { console . log ( `Current state: ${ batchJob . state } ` ); // Wait for 30 seconds before polling again await new Promise ( resolve = > setTimeout ( resolve , 30000 )); batchJob = await client . batches . get ({ name : batchJob . name }); } console . log ( `Job finished with state: ${ batchJob . state } ` ); if ( batchJob . state === 'JOB_STATE_FAILED' ) { // The exact structure of `error` might vary depending on the SDK // This assumes `error` is an object with a `message` property. console . error ( `Error: ${ batchJob . state } ` ); } } catch ( error ) { console . error ( `An error occurred while polling job ${ batchJob . name } :` , error ); } Retrieving results Once the job status indicates your batch job has succeeded, the results are available in the response field. Python import json # Use the name of the job you want to check # e.g., inline_batch_job.name from the previous step job_name = "YOUR_BATCH_JOB_NAME" batch_job = client . batches . get ( name = job_name ) if batch_job . state . name == 'JOB_STATE_SUCCEEDED' : # If batch job was created with a file if batch_job . dest and batch_job . dest . file_name : # Results are in a file result_file_name = batch_job . dest . file_name print ( f "Results are in file: { result_file_name } " ) print ( "Downloading result file content..." ) file_content = client . files . download ( file = result_file_name ) # Process file_content (bytes) as needed print ( file_content . decode ( 'utf-8' )) # If batch job was created with inline request # (for embeddings, use batch_job.dest.inlined_embed_content_responses) elif batch_job . dest and batch_job . dest . inlined_responses : # Results are inline print ( "Results are inline:" ) for i , inline_response in enumerate ( batch_job . dest . inlined_responses ): print ( f "Response { i + 1 } :" ) if inline_response . response : # Accessing response, structure may vary. try : print ( inline_response . response . text ) except AttributeError : print ( inline_response . response ) # Fallback elif inline_response . error : print ( f "Error: { inline_response . error } " ) else : print ( "No results found (neither file nor inline)." ) else : print ( f "Job did not succeed. Final state: { batch_job . state . name } " ) if batch_job . error : print ( f "Error: { batch_job . error } " ) JavaScript // Use the name of the job you want to check // e.g., inlinedBatchJob.name from the previous step const jobName = "YOUR_BATCH_JOB_NAME" ; try { const batchJob = await ai . batches . get ({ name : jobName }); if ( batchJob . state === 'JOB_STATE_SUCCEEDED' ) { console . log ( 'Found completed batch:' , batchJob . displayName ); console . log ( batchJob ); // If batch job was created with a file destination if ( batchJob . dest ? . fileName ) { const resultFileName = batchJob . dest . fileName ; console . log ( `Results are in file: ${ resultFileName } ` ); console . log ( "Downloading result file content..." ); const fileContentBuffer = await ai . files . download ({ file : resultFileName }); // Process fileContentBuffer (Buffer) as needed console . log ( fileContentBuffer . toString ( 'utf-8' )); } // If batch job was created with inline responses else if ( batchJob . dest ? . inlinedResponses ) { console . log ( "Results are inline:" ); for ( let i = 0 ; i < batchJob . dest . inlinedResponses . length ; i ++ ) { const inlineResponse = batchJob . dest . inlinedResponses [ i ]; console . log ( `Response ${ i + 1 } :` ); if ( inlineResponse . response ) { // Accessing response, structure may vary. if ( inlineResponse . response . text !== undefined ) { console . log ( inlineResponse . response . text ); } else { console . log ( inlineResponse . response ); // Fallback } } else if ( inlineResponse . error ) { console . error ( `Error: ${ inlineResponse . error } ` ); } } } // If batch job was an embedding batch with inline responses else if ( batchJob . dest ? . inlinedEmbedContentResponses ) { console . log ( "Embedding results found inline:" ); for ( let i = 0 ; i < batchJob . dest . inlinedEmbedContentResponses . length ; i ++ ) { const inlineResponse = batchJob . dest . inlinedEmbedContentResponses [ i ]; console . log ( `Response ${ i + 1 } :` ); if ( inlineResponse . response ) { console . log ( inlineResponse . response ); } else if ( inlineResponse . error ) { console . error ( `Error: ${ inlineResponse . error } ` ); } } } else { console . log ( "No results found (neither file nor inline)." ); } } else { console . log ( `Job did not succeed. Final state: ${ batchJob . state } ` ); if ( batchJob . error ) { console . error ( `Error: ${ typeof batchJob . error === 'string' ? batchJob . error : batchJob . error . message || JSON . stringify ( batchJob . error ) } ` ); } } } catch ( error ) { console . error ( `An error occurred while processing job ${ jobName } :` , error ); } REST BATCH_NAME = "batches/123456" # Your batch job name curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" 2 > /dev/null > batch_status.json if jq -r '.done' batch_status.json | grep -q "false" ; then echo "Batch has not finished processing" fi batch_state = $( jq -r '.metadata.state' batch_status.json ) if [[ $batch_state = "JOB_STATE_SUCCEEDED" ]] ; then if [[ $( jq '.response | has("inlinedResponses")' batch_status.json ) = "true" ]] ; then jq -r '.response.inlinedResponses' batch_status.json exit fi responses_file_name = $( jq -r '.response.responsesFile' batch_status.json ) curl https://generativelanguage.googleapis.com/download/v1beta/ $responses_file_name :download?alt = media \ -H "x-goog-api-key: $GEMINI_API_KEY " 2 > /dev/null elif [[ $batch_state = "JOB_STATE_FAILED" ]] ; then jq '.error' batch_status.json elif [[ $batch_state == "JOB_STATE_CANCELLED" ]] ; then echo "Batch was cancelled by the user" elif [[ $batch_state == "JOB_STATE_EXPIRED" ]] ; then echo "Batch expired after 48 hours" fi Cancelling a batch job You can cancel an ongoing batch job using its name. When a job is canceled, it stops processing new requests. Python # Cancel a batch job client . batches . cancel ( name = batch_job_to_cancel . name ) JavaScript await ai . batches . cancel ({ name : batchJobToCancel . name }); REST BATCH_NAME = "batches/123456" # Your batch job name # Cancel the batch curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME :cancel \ -H "x-goog-api-key: $GEMINI_API_KEY " \ # Confirm that the status of the batch after cancellation is JOB_STATE_CANCELLED curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type:application/json" 2 > /dev/null | jq -r '.metadata.state' Deleting a batch job You can delete an existing batch job using its name. When a job is deleted, it stops processing new requests and is removed from the list of batch jobs. Python # Delete a batch job client . batches . delete ( name = batch_job_to_delete . name ) JavaScript await ai . batches . delete ({ name : batchJobToDelete . name }); REST BATCH_NAME = "batches/123456" # Your batch job name # Delete the batch job curl https://generativelanguage.googleapis.com/v1beta/ $BATCH_NAME :delete \ -H "x-goog-api-key: $GEMINI_API_KEY " Technical details Supported models: Batch API supports a range of Gemini models. Refer to the Models page for each model's support of Batch API. The supported modalities for Batch API are the same as what's supported on the interactive (or non-batch) API. Pricing: Batch API usage is priced at 50% of the standard interactive API cost for the equivalent model. See the pricing page for details. Refer to the rate limits page for details on rate limits for this feature. Service Level Objective (SLO): Batch jobs are designed to complete within a 24-hour turnaround time. Many jobs may complete much faster depending on their size and current system load. Caching: Context caching is enabled for batch requests. If a request in your batch results in a cache hit, the cached tokens are priced the same as for non-batch API traffic. Best practices Use input files for large requests: For a large number of requests, always use the file input method for better manageability and to avoid hitting request size limits for the BatchGenerateContent call itself. Note that there's a the 2GB file size limit per input file. Error handling: Check the batchStats for failedRequestCount after a job completes. If using file output, parse each line to check if it's a GenerateContentResponse or a status object indicating an error for that specific request. See the troubleshooting guide for a complete set of error codes. Submit jobs once: The creation of a batch job is not idempotent. If you send the same creation request twice, two separate batch jobs will be created. Break up very large batches: While the target turnaround time is 24 hours, actual processing time can vary based on system load and job size. For large jobs, consider breaking them into smaller batches if intermediate results are needed sooner. What's next Check out the Batch API notebook for more examples. The OpenAI compatibility layer supports Batch API. Read the examples on the OpenAI Compatibility page. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-09-12 UTC.
text_content/docs_billing_55ad0493.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/billing#request-an-upgrade
2
  Title: Billing  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Billing | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Billing This guide provides an overview of different Gemini API billing options, explains how to enable billing and monitor usage, and provides answers to frequently asked questions (FAQs) about billing. Upgrade to the Gemini API paid tier About billing Billing for the Gemini API is based on two pricing tiers: free of charge (or free ) and pay-as-you-go (or paid ). Pricing and rate limits differ between these tiers and also vary by model. You can check out the rate limits and pricing pages for more into. For a model-by-model breakdown of capabilities, see the Gemini models page . How to request an upgrade To transition from the free tier to the pay-as-you-go plan, you need to enable billing for your Google Cloud project. The button you see in Google AI Studio depends on your project's current plan. If you're on the free tier, you'll see a Set up Billing button for your project. If you're already on the paid tier and meet the criteria for a plan change, you might see an Upgrade button. To start the process, follow these steps: Go to the AI Studio API keys page . Find the project you want to move to the paid plan and click either Set up Billing or Upgrade , depending on the button displayed. The next step depends on the button you clicked: If you clicked Set up Billing: You'll be redirected to the Google Cloud console to link a billing account to your project. Follow the on-screen instructions to complete the process. If you clicked Upgrade: The system will automatically verify your project's eligibility. If your project meets all the requirements, it will be instantly upgraded to the next tier. Why use the paid tier? When you enable billing and use the paid tier, you benefit from higher rate limits , and your prompts and responses aren't used to improve Google products. For more information on data use for paid services, see the terms of service . Cloud Billing The Gemini API uses Cloud Billing for billing services. To use the paid tier, you must set up Cloud Billing on your cloud project. After you've enabled Cloud Billing, you can use Cloud Billing tools to track spending, understand costs, make payments, and access Cloud Billing support. Enable billing You can enable Cloud Billing starting from Google AI Studio: Open Google AI Studio . In the bottom of the left sidebar, select Settings > Plan information . Click Set up Billing for your chosen project to enable Cloud Billing. Monitor usage After you enable Cloud Billing, you can monitor your usage of the Gemini API in the Google Cloud console . The service name for the API is generativelanguage.googleapis.com , and in the console the Gemini API is also referred to as the Generative Language API . To learn more, see the Google Cloud documentation on monitoring API usage . Frequently asked questions This section provides answers to frequently asked questions. What am I billed for? Gemini API pricing is based on the following: Input token count Output token count Cached token count Cached token storage duration For pricing information, see the pricing page . Where can I view my quota? You can view your quota and system limits in the Google Cloud console . How do I request more quota? To request more quota, follow the instructions at How to request an upgrade . Can I use the Gemini API for free in EEA (including EU), the UK, and CH? Yes, we make the free tier and paid tier available in many regions . If I set up billing with the Gemini API, will I be charged for my Google AI Studio usage? No, Google AI Studio usage remains free of charge regardless of if you set up billing across all supported regions. Can I use 1M tokens in the free tier? The free tier for Gemini API differs based on the model selected. For now, you can try the 1M token context window in the following ways: In Google AI Studio With pay-as-you-go plans With free-of-charge plans for select models See the latest free-of-charge rate limits per model on rate limits page . How can I calculate the number of tokens I'm using? Use the GenerativeModel.count_tokens method to count the number of tokens. Refer to the Tokens guide to learn more about tokens. Can I use my Google Cloud credits with the Gemini API? Yes, Google Cloud credits can be used towards Gemini API usage. How is billing handled? Billing for the Gemini API is handled by the Cloud Billing system. Am I charged for failed requests? If your request fails with a 400 or 500 error, you won't be charged for the tokens used. However, the request will still count against your quota. Is there a charge for fine-tuning the models? Model tuning is free, but inference on tuned models is charged at the same rate as the base models. Is GetTokens billed? Requests to the GetTokens API are not billed, and they don't count against inference quota. How is my Google AI Studio data handled if I have a paid API account? Refer to the terms for details on how data is handled when Cloud billing is enabled (see "How Google Uses Your Data" under "Paid Services"). Note that your Google AI Studio prompts are treated under the same "Paid Services" terms so long as at least 1 API project has billing enabled, which you can validate on the Gemini API Key page if you see any projects marked as "Paid" under "Plan". Where can I get help with billing? To get help with billing, see Get Cloud Billing support . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-05-31 UTC.
 
2
  Title: Billing  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Billing | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Billing This guide provides an overview of different Gemini API billing options, explains how to enable billing and monitor usage, and provides answers to frequently asked questions (FAQs) about billing. Upgrade to the Gemini API paid tier About billing Billing for the Gemini API is based on two pricing tiers: free of charge (or free ) and pay-as-you-go (or paid ). Pricing and rate limits differ between these tiers and also vary by model. You can check out the rate limits and pricing pages for more into. For a model-by-model breakdown of capabilities, see the Gemini models page . How to request an upgrade To transition from the free tier to the pay-as-you-go plan, you need to enable billing for your Google Cloud project. The button you see in Google AI Studio depends on your project's current plan. If you're on the free tier, you'll see a Set up Billing button for your project. If you're already on the paid tier and meet the criteria for a plan change, you might see an Upgrade button. To start the process, follow these steps: Go to the AI Studio API keys page . Find the project you want to move to the paid plan and click either Set up Billing or Upgrade , depending on the button displayed. The next step depends on the button you clicked: If you clicked Set up Billing: You'll be redirected to the Google Cloud console to link a billing account to your project. Follow the on-screen instructions to complete the process. If you clicked Upgrade: The system will automatically verify your project's eligibility. If your project meets all the requirements, it will be instantly upgraded to the next tier. Why use the paid tier? When you enable billing and use the paid tier, you benefit from higher rate limits , and your prompts and responses aren't used to improve Google products. For more information on data use for paid services, see the terms of service . Cloud Billing The Gemini API uses Cloud Billing for billing services. To use the paid tier, you must set up Cloud Billing on your cloud project. After you've enabled Cloud Billing, you can use Cloud Billing tools to track spending, understand costs, make payments, and access Cloud Billing support. Enable billing You can enable Cloud Billing starting from Google AI Studio: Open Google AI Studio . In the bottom of the left sidebar, select Settings > Plan information . Click Set up Billing for your chosen project to enable Cloud Billing. Monitor usage After you enable Cloud Billing, you can monitor your usage of the Gemini API in the Google Cloud console . The service name for the API is generativelanguage.googleapis.com , and in the console the Gemini API is also referred to as the Generative Language API . To learn more, see the Google Cloud documentation on monitoring API usage . Frequently asked questions This section provides answers to frequently asked questions. What am I billed for? Gemini API pricing is based on the following: Input token count Output token count Cached token count Cached token storage duration For pricing information, see the pricing page . Where can I view my quota? You can view your quota and system limits in the Google Cloud console . How do I request more quota? To request more quota, follow the instructions at How to request an upgrade . Can I use the Gemini API for free in EEA (including EU), the UK, and CH? Yes, we make the free tier and paid tier available in many regions . If I set up billing with the Gemini API, will I be charged for my Google AI Studio usage? No, Google AI Studio usage remains free of charge regardless of if you set up billing across all supported regions. Can I use 1M tokens in the free tier? The free tier for Gemini API differs based on the model selected. For now, you can try the 1M token context window in the following ways: In Google AI Studio With pay-as-you-go plans With free-of-charge plans for select models See the latest free-of-charge rate limits per model on rate limits page . How can I calculate the number of tokens I'm using? Use the GenerativeModel.count_tokens method to count the number of tokens. Refer to the Tokens guide to learn more about tokens. Can I use my Google Cloud credits with the Gemini API? Yes, Google Cloud credits can be used towards Gemini API usage. How is billing handled? Billing for the Gemini API is handled by the Cloud Billing system. Am I charged for failed requests? If your request fails with a 400 or 500 error, you won't be charged for the tokens used. However, the request will still count against your quota. Is there a charge for fine-tuning the models? Model tuning is free, but inference on tuned models is charged at the same rate as the base models. Is GetTokens billed? Requests to the GetTokens API are not billed, and they don't count against inference quota. How is my Google AI Studio data handled if I have a paid API account? Refer to the terms for details on how data is handled when Cloud billing is enabled (see "How Google Uses Your Data" under "Paid Services"). Note that your Google AI Studio prompts are treated under the same "Paid Services" terms so long as at least 1 API project has billing enabled, which you can validate on the Gemini API Key page if you see any projects marked as "Paid" under "Plan". Where can I get help with billing? To get help with billing, see Get Cloud Billing support . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_billing_932c132c.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/billing#main-content
2
  Title: Billing  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Billing | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Billing This guide provides an overview of different Gemini API billing options, explains how to enable billing and monitor usage, and provides answers to frequently asked questions (FAQs) about billing. Upgrade to the Gemini API paid tier About billing Billing for the Gemini API is based on two pricing tiers: free of charge (or free ) and pay-as-you-go (or paid ). Pricing and rate limits differ between these tiers and also vary by model. You can check out the rate limits and pricing pages for more into. For a model-by-model breakdown of capabilities, see the Gemini models page . How to request an upgrade To transition from the free tier to the pay-as-you-go plan, you need to enable billing for your Google Cloud project. The button you see in Google AI Studio depends on your project's current plan. If you're on the free tier, you'll see a Set up Billing button for your project. If you're already on the paid tier and meet the criteria for a plan change, you might see an Upgrade button. To start the process, follow these steps: Go to the AI Studio API keys page . Find the project you want to move to the paid plan and click either Set up Billing or Upgrade , depending on the button displayed. The next step depends on the button you clicked: If you clicked Set up Billing: You'll be redirected to the Google Cloud console to link a billing account to your project. Follow the on-screen instructions to complete the process. If you clicked Upgrade: The system will automatically verify your project's eligibility. If your project meets all the requirements, it will be instantly upgraded to the next tier. Why use the paid tier? When you enable billing and use the paid tier, you benefit from higher rate limits , and your prompts and responses aren't used to improve Google products. For more information on data use for paid services, see the terms of service . Cloud Billing The Gemini API uses Cloud Billing for billing services. To use the paid tier, you must set up Cloud Billing on your cloud project. After you've enabled Cloud Billing, you can use Cloud Billing tools to track spending, understand costs, make payments, and access Cloud Billing support. Enable billing You can enable Cloud Billing starting from Google AI Studio: Open Google AI Studio . In the bottom of the left sidebar, select Settings > Plan information . Click Set up Billing for your chosen project to enable Cloud Billing. Monitor usage After you enable Cloud Billing, you can monitor your usage of the Gemini API in the Google Cloud console . The service name for the API is generativelanguage.googleapis.com , and in the console the Gemini API is also referred to as the Generative Language API . To learn more, see the Google Cloud documentation on monitoring API usage . Frequently asked questions This section provides answers to frequently asked questions. What am I billed for? Gemini API pricing is based on the following: Input token count Output token count Cached token count Cached token storage duration For pricing information, see the pricing page . Where can I view my quota? You can view your quota and system limits in the Google Cloud console . How do I request more quota? To request more quota, follow the instructions at How to request an upgrade . Can I use the Gemini API for free in EEA (including EU), the UK, and CH? Yes, we make the free tier and paid tier available in many regions . If I set up billing with the Gemini API, will I be charged for my Google AI Studio usage? No, Google AI Studio usage remains free of charge regardless of if you set up billing across all supported regions. Can I use 1M tokens in the free tier? The free tier for Gemini API differs based on the model selected. For now, you can try the 1M token context window in the following ways: In Google AI Studio With pay-as-you-go plans With free-of-charge plans for select models See the latest free-of-charge rate limits per model on rate limits page . How can I calculate the number of tokens I'm using? Use the GenerativeModel.count_tokens method to count the number of tokens. Refer to the Tokens guide to learn more about tokens. Can I use my Google Cloud credits with the Gemini API? Yes, Google Cloud credits can be used towards Gemini API usage. How is billing handled? Billing for the Gemini API is handled by the Cloud Billing system. Am I charged for failed requests? If your request fails with a 400 or 500 error, you won't be charged for the tokens used. However, the request will still count against your quota. Is there a charge for fine-tuning the models? Model tuning is free, but inference on tuned models is charged at the same rate as the base models. Is GetTokens billed? Requests to the GetTokens API are not billed, and they don't count against inference quota. How is my Google AI Studio data handled if I have a paid API account? Refer to the terms for details on how data is handled when Cloud billing is enabled (see "How Google Uses Your Data" under "Paid Services"). Note that your Google AI Studio prompts are treated under the same "Paid Services" terms so long as at least 1 API project has billing enabled, which you can validate on the Gemini API Key page if you see any projects marked as "Paid" under "Plan". Where can I get help with billing? To get help with billing, see Get Cloud Billing support . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-05-31 UTC.
 
2
  Title: Billing  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Billing | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Billing This guide provides an overview of different Gemini API billing options, explains how to enable billing and monitor usage, and provides answers to frequently asked questions (FAQs) about billing. Upgrade to the Gemini API paid tier About billing Billing for the Gemini API is based on two pricing tiers: free of charge (or free ) and pay-as-you-go (or paid ). Pricing and rate limits differ between these tiers and also vary by model. You can check out the rate limits and pricing pages for more into. For a model-by-model breakdown of capabilities, see the Gemini models page . How to request an upgrade To transition from the free tier to the pay-as-you-go plan, you need to enable billing for your Google Cloud project. The button you see in Google AI Studio depends on your project's current plan. If you're on the free tier, you'll see a Set up Billing button for your project. If you're already on the paid tier and meet the criteria for a plan change, you might see an Upgrade button. To start the process, follow these steps: Go to the AI Studio API keys page . Find the project you want to move to the paid plan and click either Set up Billing or Upgrade , depending on the button displayed. The next step depends on the button you clicked: If you clicked Set up Billing: You'll be redirected to the Google Cloud console to link a billing account to your project. Follow the on-screen instructions to complete the process. If you clicked Upgrade: The system will automatically verify your project's eligibility. If your project meets all the requirements, it will be instantly upgraded to the next tier. Why use the paid tier? When you enable billing and use the paid tier, you benefit from higher rate limits , and your prompts and responses aren't used to improve Google products. For more information on data use for paid services, see the terms of service . Cloud Billing The Gemini API uses Cloud Billing for billing services. To use the paid tier, you must set up Cloud Billing on your cloud project. After you've enabled Cloud Billing, you can use Cloud Billing tools to track spending, understand costs, make payments, and access Cloud Billing support. Enable billing You can enable Cloud Billing starting from Google AI Studio: Open Google AI Studio . In the bottom of the left sidebar, select Settings > Plan information . Click Set up Billing for your chosen project to enable Cloud Billing. Monitor usage After you enable Cloud Billing, you can monitor your usage of the Gemini API in the Google Cloud console . The service name for the API is generativelanguage.googleapis.com , and in the console the Gemini API is also referred to as the Generative Language API . To learn more, see the Google Cloud documentation on monitoring API usage . Frequently asked questions This section provides answers to frequently asked questions. What am I billed for? Gemini API pricing is based on the following: Input token count Output token count Cached token count Cached token storage duration For pricing information, see the pricing page . Where can I view my quota? You can view your quota and system limits in the Google Cloud console . How do I request more quota? To request more quota, follow the instructions at How to request an upgrade . Can I use the Gemini API for free in EEA (including EU), the UK, and CH? Yes, we make the free tier and paid tier available in many regions . If I set up billing with the Gemini API, will I be charged for my Google AI Studio usage? No, Google AI Studio usage remains free of charge regardless of if you set up billing across all supported regions. Can I use 1M tokens in the free tier? The free tier for Gemini API differs based on the model selected. For now, you can try the 1M token context window in the following ways: In Google AI Studio With pay-as-you-go plans With free-of-charge plans for select models See the latest free-of-charge rate limits per model on rate limits page . How can I calculate the number of tokens I'm using? Use the GenerativeModel.count_tokens method to count the number of tokens. Refer to the Tokens guide to learn more about tokens. Can I use my Google Cloud credits with the Gemini API? Yes, Google Cloud credits can be used towards Gemini API usage. How is billing handled? Billing for the Gemini API is handled by the Cloud Billing system. Am I charged for failed requests? If your request fails with a 400 or 500 error, you won't be charged for the tokens used. However, the request will still count against your quota. Is there a charge for fine-tuning the models? Model tuning is free, but inference on tuned models is charged at the same rate as the base models. Is GetTokens billed? Requests to the GetTokens API are not billed, and they don't count against inference quota. How is my Google AI Studio data handled if I have a paid API account? Refer to the terms for details on how data is handled when Cloud billing is enabled (see "How Google Uses Your Data" under "Paid Services"). Note that your Google AI Studio prompts are treated under the same "Paid Services" terms so long as at least 1 API project has billing enabled, which you can validate on the Gemini API Key page if you see any projects marked as "Paid" under "Plan". Where can I get help with billing? To get help with billing, see Get Cloud Billing support . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_billing_c19877aa.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/billing#enable-cloud-billing
2
  Title: Billing  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Billing | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Billing This guide provides an overview of different Gemini API billing options, explains how to enable billing and monitor usage, and provides answers to frequently asked questions (FAQs) about billing. Upgrade to the Gemini API paid tier About billing Billing for the Gemini API is based on two pricing tiers: free of charge (or free ) and pay-as-you-go (or paid ). Pricing and rate limits differ between these tiers and also vary by model. You can check out the rate limits and pricing pages for more into. For a model-by-model breakdown of capabilities, see the Gemini models page . How to request an upgrade To transition from the free tier to the pay-as-you-go plan, you need to enable billing for your Google Cloud project. The button you see in Google AI Studio depends on your project's current plan. If you're on the free tier, you'll see a Set up Billing button for your project. If you're already on the paid tier and meet the criteria for a plan change, you might see an Upgrade button. To start the process, follow these steps: Go to the AI Studio API keys page . Find the project you want to move to the paid plan and click either Set up Billing or Upgrade , depending on the button displayed. The next step depends on the button you clicked: If you clicked Set up Billing: You'll be redirected to the Google Cloud console to link a billing account to your project. Follow the on-screen instructions to complete the process. If you clicked Upgrade: The system will automatically verify your project's eligibility. If your project meets all the requirements, it will be instantly upgraded to the next tier. Why use the paid tier? When you enable billing and use the paid tier, you benefit from higher rate limits , and your prompts and responses aren't used to improve Google products. For more information on data use for paid services, see the terms of service . Cloud Billing The Gemini API uses Cloud Billing for billing services. To use the paid tier, you must set up Cloud Billing on your cloud project. After you've enabled Cloud Billing, you can use Cloud Billing tools to track spending, understand costs, make payments, and access Cloud Billing support. Enable billing You can enable Cloud Billing starting from Google AI Studio: Open Google AI Studio . In the bottom of the left sidebar, select Settings > Plan information . Click Set up Billing for your chosen project to enable Cloud Billing. Monitor usage After you enable Cloud Billing, you can monitor your usage of the Gemini API in the Google Cloud console . The service name for the API is generativelanguage.googleapis.com , and in the console the Gemini API is also referred to as the Generative Language API . To learn more, see the Google Cloud documentation on monitoring API usage . Frequently asked questions This section provides answers to frequently asked questions. What am I billed for? Gemini API pricing is based on the following: Input token count Output token count Cached token count Cached token storage duration For pricing information, see the pricing page . Where can I view my quota? You can view your quota and system limits in the Google Cloud console . How do I request more quota? To request more quota, follow the instructions at How to request an upgrade . Can I use the Gemini API for free in EEA (including EU), the UK, and CH? Yes, we make the free tier and paid tier available in many regions . If I set up billing with the Gemini API, will I be charged for my Google AI Studio usage? No, Google AI Studio usage remains free of charge regardless of if you set up billing across all supported regions. Can I use 1M tokens in the free tier? The free tier for Gemini API differs based on the model selected. For now, you can try the 1M token context window in the following ways: In Google AI Studio With pay-as-you-go plans With free-of-charge plans for select models See the latest free-of-charge rate limits per model on rate limits page . How can I calculate the number of tokens I'm using? Use the GenerativeModel.count_tokens method to count the number of tokens. Refer to the Tokens guide to learn more about tokens. Can I use my Google Cloud credits with the Gemini API? Yes, Google Cloud credits can be used towards Gemini API usage. How is billing handled? Billing for the Gemini API is handled by the Cloud Billing system. Am I charged for failed requests? If your request fails with a 400 or 500 error, you won't be charged for the tokens used. However, the request will still count against your quota. Is there a charge for fine-tuning the models? Model tuning is free, but inference on tuned models is charged at the same rate as the base models. Is GetTokens billed? Requests to the GetTokens API are not billed, and they don't count against inference quota. How is my Google AI Studio data handled if I have a paid API account? Refer to the terms for details on how data is handled when Cloud billing is enabled (see "How Google Uses Your Data" under "Paid Services"). Note that your Google AI Studio prompts are treated under the same "Paid Services" terms so long as at least 1 API project has billing enabled, which you can validate on the Gemini API Key page if you see any projects marked as "Paid" under "Plan". Where can I get help with billing? To get help with billing, see Get Cloud Billing support . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-05-31 UTC.
 
2
  Title: Billing  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Billing | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Billing This guide provides an overview of different Gemini API billing options, explains how to enable billing and monitor usage, and provides answers to frequently asked questions (FAQs) about billing. Upgrade to the Gemini API paid tier About billing Billing for the Gemini API is based on two pricing tiers: free of charge (or free ) and pay-as-you-go (or paid ). Pricing and rate limits differ between these tiers and also vary by model. You can check out the rate limits and pricing pages for more into. For a model-by-model breakdown of capabilities, see the Gemini models page . How to request an upgrade To transition from the free tier to the pay-as-you-go plan, you need to enable billing for your Google Cloud project. The button you see in Google AI Studio depends on your project's current plan. If you're on the free tier, you'll see a Set up Billing button for your project. If you're already on the paid tier and meet the criteria for a plan change, you might see an Upgrade button. To start the process, follow these steps: Go to the AI Studio API keys page . Find the project you want to move to the paid plan and click either Set up Billing or Upgrade , depending on the button displayed. The next step depends on the button you clicked: If you clicked Set up Billing: You'll be redirected to the Google Cloud console to link a billing account to your project. Follow the on-screen instructions to complete the process. If you clicked Upgrade: The system will automatically verify your project's eligibility. If your project meets all the requirements, it will be instantly upgraded to the next tier. Why use the paid tier? When you enable billing and use the paid tier, you benefit from higher rate limits , and your prompts and responses aren't used to improve Google products. For more information on data use for paid services, see the terms of service . Cloud Billing The Gemini API uses Cloud Billing for billing services. To use the paid tier, you must set up Cloud Billing on your cloud project. After you've enabled Cloud Billing, you can use Cloud Billing tools to track spending, understand costs, make payments, and access Cloud Billing support. Enable billing You can enable Cloud Billing starting from Google AI Studio: Open Google AI Studio . In the bottom of the left sidebar, select Settings > Plan information . Click Set up Billing for your chosen project to enable Cloud Billing. Monitor usage After you enable Cloud Billing, you can monitor your usage of the Gemini API in the Google Cloud console . The service name for the API is generativelanguage.googleapis.com , and in the console the Gemini API is also referred to as the Generative Language API . To learn more, see the Google Cloud documentation on monitoring API usage . Frequently asked questions This section provides answers to frequently asked questions. What am I billed for? Gemini API pricing is based on the following: Input token count Output token count Cached token count Cached token storage duration For pricing information, see the pricing page . Where can I view my quota? You can view your quota and system limits in the Google Cloud console . How do I request more quota? To request more quota, follow the instructions at How to request an upgrade . Can I use the Gemini API for free in EEA (including EU), the UK, and CH? Yes, we make the free tier and paid tier available in many regions . If I set up billing with the Gemini API, will I be charged for my Google AI Studio usage? No, Google AI Studio usage remains free of charge regardless of if you set up billing across all supported regions. Can I use 1M tokens in the free tier? The free tier for Gemini API differs based on the model selected. For now, you can try the 1M token context window in the following ways: In Google AI Studio With pay-as-you-go plans With free-of-charge plans for select models See the latest free-of-charge rate limits per model on rate limits page . How can I calculate the number of tokens I'm using? Use the GenerativeModel.count_tokens method to count the number of tokens. Refer to the Tokens guide to learn more about tokens. Can I use my Google Cloud credits with the Gemini API? Yes, Google Cloud credits can be used towards Gemini API usage. How is billing handled? Billing for the Gemini API is handled by the Cloud Billing system. Am I charged for failed requests? If your request fails with a 400 or 500 error, you won't be charged for the tokens used. However, the request will still count against your quota. Is there a charge for fine-tuning the models? Model tuning is free, but inference on tuned models is charged at the same rate as the base models. Is GetTokens billed? Requests to the GetTokens API are not billed, and they don't count against inference quota. How is my Google AI Studio data handled if I have a paid API account? Refer to the terms for details on how data is handled when Cloud billing is enabled (see "How Google Uses Your Data" under "Paid Services"). Note that your Google AI Studio prompts are treated under the same "Paid Services" terms so long as at least 1 API project has billing enabled, which you can validate on the Gemini API Key page if you see any projects marked as "Paid" under "Plan". Where can I get help with billing? To get help with billing, see Get Cloud Billing support . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_code-execution_61a8afb4.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/code-execution
2
  Title: Code execution  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Code execution | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Code execution The Gemini API provides a code execution tool that enables the model to generate and run Python code. The model can then learn iteratively from the code execution results until it arrives at a final output. You can use code execution to build applications that benefit from code-based reasoning. For example, you can use code execution to solve equations or process text. You can also use the libraries included in the code execution environment to perform more specialized tasks. Gemini is only able to execute code in Python. You can still ask Gemini to generate code in another language, but the model can't use the code execution tool to run it. Enable code execution To enable code execution, configure the code execution tool on the model. This allows the model to generate and run code. Python from google import genai from google.genai import types client = genai . Client () response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "What is the sum of the first 50 prime numbers? " "Generate and run code for the calculation, and make sure you get all 50." , config = types . GenerateContentConfig ( tools = [ types . Tool ( code_execution = types . ToolCodeExecution )] ), ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) if part . executable_code is not None : print ( part . executable_code . code ) if part . code_execution_result is not None : print ( part . code_execution_result . output ) JavaScript import { GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({}); let response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : [ "What is the sum of the first 50 prime numbers? " + "Generate and run code for the calculation, and make sure you get all 50." , ], config : { tools : [{ codeExecution : {} }], }, }); const parts = response ? . candidates ? .[ 0 ] ? . content ? . parts || []; parts . forEach (( part ) = > { if ( part . text ) { console . log ( part . text ); } if ( part . executableCode && part . executableCode . code ) { console . log ( part . executableCode . code ); } if ( part . codeExecutionResult && part . codeExecutionResult . output ) { console . log ( part . codeExecutionResult . output ); } }); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } config := & genai . GenerateContentConfig { Tools : [] * genai . Tool { { CodeExecution : & genai . ToolCodeExecution {}}, }, } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , genai . Text ( "What is the sum of the first 50 prime numbers? " + "Generate and run code for the calculation, and make sure you get all 50." ), config , ) fmt . Println ( result . Text ()) fmt . Println ( result . ExecutableCode ()) fmt . Println ( result . CodeExecutionResult ()) } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d ' {"tools": [{"code_execution": {}}], "contents": { "parts": { "text": "What is the sum of the first 50 prime numbers? Generate and run code for the calculation, and make sure you get all 50." } }, }' Note: This REST example doesn't parse the JSON response as shown in the example output. The output might look something like the following, which has been formatted for readability: Okay, I need to calculate the sum of the first 50 prime numbers. Here's how I'll approach this: 1. **Generate Prime Numbers:** I'll use an iterative method to find prime numbers. I'll start with 2 and check if each subsequent number is divisible by any number between 2 and its square root. If not, it's a prime. 2. **Store Primes:** I'll store the prime numbers in a list until I have 50 of them. 3. **Calculate the Sum:** Finally, I'll sum the prime numbers in the list. Here's the Python code to do this: def is_prime(n): """Efficiently checks if a number is prime.""" if n <= 1: return False if n <= 3: return True if n % 2 == 0 or n % 3 == 0: return False i = 5 while i * i <= n: if n % i == 0 or n % (i + 2) == 0: return False i += 6 return True primes = [] num = 2 while len(primes) < 50: if is_prime(num): primes.append(num) num += 1 sum_of_primes = sum(primes) print(f'{primes=}') print(f'{sum_of_primes=}') primes=[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229] sum_of_primes=5117 The sum of the first 50 prime numbers is 5117. This output combines several content parts that the model returns when using code execution: text : Inline text generated by the model executableCode : Code generated by the model that is meant to be executed codeExecutionResult : Result of the executable code The naming conventions for these parts vary by programming language. Use code execution in chat You can also use code execution as part of a chat. Python from google import genai from google.genai import types client = genai . Client () chat = client . chats . create ( model = "gemini-2.5-flash" , config = types . GenerateContentConfig ( tools = [ types . Tool ( code_execution = types . ToolCodeExecution )] ), ) response = chat . send_message ( "I have a math question for you." ) print ( response . text ) response = chat . send_message ( "What is the sum of the first 50 prime numbers? " "Generate and run code for the calculation, and make sure you get all 50." ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) if part . executable_code is not None : print ( part . executable_code . code ) if part . code_execution_result is not None : print ( part . code_execution_result . output ) JavaScript import { GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({}); const chat = ai . chats . create ({ model : "gemini-2.5-flash" , history : [ { role : "user" , parts : [{ text : "I have a math question for you:" }], }, { role : "model" , parts : [{ text : "Great! I'm ready for your math question. Please ask away." }], }, ], config : { tools : [{ codeExecution : {}}], } }); const response = await chat . sendMessage ({ message : "What is the sum of the first 50 prime numbers? " + "Generate and run code for the calculation, and make sure you get all 50." }); console . log ( "Chat response:" , response . text ); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } config := & genai . GenerateContentConfig { Tools : [] * genai . Tool { { CodeExecution : & genai . ToolCodeExecution {}}, }, } chat , _ := client . Chats . Create ( ctx , "gemini-2.5-flash" , config , nil , ) result , _ := chat . SendMessage ( ctx , genai . Part { Text : "What is the sum of the first 50 prime numbers? " + "Generate and run code for the calculation, and " + "make sure you get all 50." , }, ) fmt . Println ( result . Text ()) fmt . Println ( result . ExecutableCode ()) fmt . Println ( result . CodeExecutionResult ()) } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d '{"tools": [{"code_execution": {}}], "contents": [ { "role": "user", "parts": [{ "text": "Can you print \"Hello world!\"?" }] },{ "role": "model", "parts": [ { "text": "" }, { "executable_code": { "language": "PYTHON", "code": "\nprint(\"hello world!\")\n" } }, { "code_execution_result": { "outcome": "OUTCOME_OK", "output": "hello world!\n" } }, { "text": "I have printed \"hello world!\" using the provided python code block. \n" } ], },{ "role": "user", "parts": [{ "text": "What is the sum of the first 50 prime numbers? Generate and run code for the calculation, and make sure you get all 50." }] } ] }' Input/output (I/O) Starting with Gemini 2.0 Flash , code execution supports file input and graph output. Using these input and output capabilities, you can upload CSV and text files, ask questions about the files, and have Matplotlib graphs generated as part of the response. The output files are returned as inline images in the response. I/O pricing When using code execution I/O, you're charged for input tokens and output tokens: Input tokens: User prompt Output tokens: Code generated by the model Code execution output in the code environment Thinking tokens Summary generated by the model I/O details When you're working with code execution I/O, be aware of the following technical details: The maximum runtime of the code environment is 30 seconds. If the code environment generates an error, the model may decide to regenerate the code output. This can happen up to 5 times. The maximum file input size is limited by the model token window. In AI Studio, using Gemini Flash 2.0, the maximum input file size is 1 million tokens (roughly 2MB for text files of the supported input types). If you upload a file that's too large, AI Studio won't let you send it. Code execution works best with text and CSV files. The input file can be passed in part.inlineData or part.fileData (uploaded via the Files API ), and the output file is always returned as part.inlineData . Single turn Bidirectional (Multimodal Live API) Models supported All Gemini 2.0 and 2.5 models Only Flash experimental models File input types supported .png, .jpeg, .csv, .xml, .cpp, .java, .py, .js, .ts .png, .jpeg, .csv, .xml, .cpp, .java, .py, .js, .ts Plotting libraries supported Matplotlib, seaborn Matplotlib, seaborn Multi-tool use Yes (code execution + grounding only) Yes Billing There's no additional charge for enabling code execution from the Gemini API. You'll be billed at the current rate of input and output tokens based on the Gemini model you're using. Here are a few other things to know about billing for code execution: You're only billed once for the input tokens you pass to the model, and you're billed for the final output tokens returned to you by the model. Tokens representing generated code are counted as output tokens. Generated code can include text and multimodal output like images. Code execution results are also counted as output tokens. The billing model is shown in the following diagram: You're billed at the current rate of input and output tokens based on the Gemini model you're using. If Gemini uses code execution when generating your response, the original prompt, the generated code, and the result of the executed code are labeled intermediate tokens and are billed as input tokens . Gemini then generates a summary and returns the generated code, the result of the executed code, and the final summary. These are billed as output tokens . The Gemini API includes an intermediate token count in the API response, so you know why you're getting additional input tokens beyond your initial prompt. Limitations The model can only generate and execute code. It can't return other artifacts like media files. In some cases, enabling code execution can lead to regressions in other areas of model output (for example, writing a story). There is some variation in the ability of the different models to use code execution successfully. Supported libraries The code execution environment includes the following libraries: attrs chess contourpy fpdf geopandas imageio jinja2 joblib jsonschema jsonschema-specifications lxml matplotlib mpmath numpy opencv-python openpyxl packaging pandas pillow protobuf pylatex pyparsing PyPDF2 python-dateutil python-docx python-pptx reportlab scikit-learn scipy seaborn six striprtf sympy tabulate tensorflow toolz xlrd You can't install your own libraries. Note: Only matplotlib is supported for graph rendering using code execution. What's next Try the code execution Colab . Learn about other Gemini API tools: Function calling Grounding with Google Search Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
 
2
  Title: Code execution  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Code execution | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Code execution The Gemini API provides a code execution tool that enables the model to generate and run Python code. The model can then learn iteratively from the code execution results until it arrives at a final output. You can use code execution to build applications that benefit from code-based reasoning. For example, you can use code execution to solve equations or process text. You can also use the libraries included in the code execution environment to perform more specialized tasks. Gemini is only able to execute code in Python. You can still ask Gemini to generate code in another language, but the model can't use the code execution tool to run it. Enable code execution To enable code execution, configure the code execution tool on the model. This allows the model to generate and run code. Python from google import genai from google.genai import types client = genai . Client () response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "What is the sum of the first 50 prime numbers? " "Generate and run code for the calculation, and make sure you get all 50." , config = types . GenerateContentConfig ( tools = [ types . Tool ( code_execution = types . ToolCodeExecution )] ), ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) if part . executable_code is not None : print ( part . executable_code . code ) if part . code_execution_result is not None : print ( part . code_execution_result . output ) JavaScript import { GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({}); let response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : [ "What is the sum of the first 50 prime numbers? " + "Generate and run code for the calculation, and make sure you get all 50." , ], config : { tools : [{ codeExecution : {} }], }, }); const parts = response ? . candidates ? .[ 0 ] ? . content ? . parts || []; parts . forEach (( part ) = > { if ( part . text ) { console . log ( part . text ); } if ( part . executableCode && part . executableCode . code ) { console . log ( part . executableCode . code ); } if ( part . codeExecutionResult && part . codeExecutionResult . output ) { console . log ( part . codeExecutionResult . output ); } }); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } config := & genai . GenerateContentConfig { Tools : [] * genai . Tool { { CodeExecution : & genai . ToolCodeExecution {}}, }, } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , genai . Text ( "What is the sum of the first 50 prime numbers? " + "Generate and run code for the calculation, and make sure you get all 50." ), config , ) fmt . Println ( result . Text ()) fmt . Println ( result . ExecutableCode ()) fmt . Println ( result . CodeExecutionResult ()) } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d ' {"tools": [{"code_execution": {}}], "contents": { "parts": { "text": "What is the sum of the first 50 prime numbers? Generate and run code for the calculation, and make sure you get all 50." } }, }' Note: This REST example doesn't parse the JSON response as shown in the example output. The output might look something like the following, which has been formatted for readability: Okay, I need to calculate the sum of the first 50 prime numbers. Here's how I'll approach this: 1. **Generate Prime Numbers:** I'll use an iterative method to find prime numbers. I'll start with 2 and check if each subsequent number is divisible by any number between 2 and its square root. If not, it's a prime. 2. **Store Primes:** I'll store the prime numbers in a list until I have 50 of them. 3. **Calculate the Sum:** Finally, I'll sum the prime numbers in the list. Here's the Python code to do this: def is_prime(n): """Efficiently checks if a number is prime.""" if n <= 1: return False if n <= 3: return True if n % 2 == 0 or n % 3 == 0: return False i = 5 while i * i <= n: if n % i == 0 or n % (i + 2) == 0: return False i += 6 return True primes = [] num = 2 while len(primes) < 50: if is_prime(num): primes.append(num) num += 1 sum_of_primes = sum(primes) print(f'{primes=}') print(f'{sum_of_primes=}') primes=[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229] sum_of_primes=5117 The sum of the first 50 prime numbers is 5117. This output combines several content parts that the model returns when using code execution: text : Inline text generated by the model executableCode : Code generated by the model that is meant to be executed codeExecutionResult : Result of the executable code The naming conventions for these parts vary by programming language. Use code execution in chat You can also use code execution as part of a chat. Python from google import genai from google.genai import types client = genai . Client () chat = client . chats . create ( model = "gemini-2.5-flash" , config = types . GenerateContentConfig ( tools = [ types . Tool ( code_execution = types . ToolCodeExecution )] ), ) response = chat . send_message ( "I have a math question for you." ) print ( response . text ) response = chat . send_message ( "What is the sum of the first 50 prime numbers? " "Generate and run code for the calculation, and make sure you get all 50." ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) if part . executable_code is not None : print ( part . executable_code . code ) if part . code_execution_result is not None : print ( part . code_execution_result . output ) JavaScript import { GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({}); const chat = ai . chats . create ({ model : "gemini-2.5-flash" , history : [ { role : "user" , parts : [{ text : "I have a math question for you:" }], }, { role : "model" , parts : [{ text : "Great! I'm ready for your math question. Please ask away." }], }, ], config : { tools : [{ codeExecution : {}}], } }); const response = await chat . sendMessage ({ message : "What is the sum of the first 50 prime numbers? " + "Generate and run code for the calculation, and make sure you get all 50." }); console . log ( "Chat response:" , response . text ); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } config := & genai . GenerateContentConfig { Tools : [] * genai . Tool { { CodeExecution : & genai . ToolCodeExecution {}}, }, } chat , _ := client . Chats . Create ( ctx , "gemini-2.5-flash" , config , nil , ) result , _ := chat . SendMessage ( ctx , genai . Part { Text : "What is the sum of the first 50 prime numbers? " + "Generate and run code for the calculation, and " + "make sure you get all 50." , }, ) fmt . Println ( result . Text ()) fmt . Println ( result . ExecutableCode ()) fmt . Println ( result . CodeExecutionResult ()) } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d '{"tools": [{"code_execution": {}}], "contents": [ { "role": "user", "parts": [{ "text": "Can you print \"Hello world!\"?" }] },{ "role": "model", "parts": [ { "text": "" }, { "executable_code": { "language": "PYTHON", "code": "\nprint(\"hello world!\")\n" } }, { "code_execution_result": { "outcome": "OUTCOME_OK", "output": "hello world!\n" } }, { "text": "I have printed \"hello world!\" using the provided python code block. \n" } ], },{ "role": "user", "parts": [{ "text": "What is the sum of the first 50 prime numbers? Generate and run code for the calculation, and make sure you get all 50." }] } ] }' Input/output (I/O) Starting with Gemini 2.0 Flash , code execution supports file input and graph output. Using these input and output capabilities, you can upload CSV and text files, ask questions about the files, and have Matplotlib graphs generated as part of the response. The output files are returned as inline images in the response. I/O pricing When using code execution I/O, you're charged for input tokens and output tokens: Input tokens: User prompt Output tokens: Code generated by the model Code execution output in the code environment Thinking tokens Summary generated by the model I/O details When you're working with code execution I/O, be aware of the following technical details: The maximum runtime of the code environment is 30 seconds. If the code environment generates an error, the model may decide to regenerate the code output. This can happen up to 5 times. The maximum file input size is limited by the model token window. In AI Studio, using Gemini Flash 2.0, the maximum input file size is 1 million tokens (roughly 2MB for text files of the supported input types). If you upload a file that's too large, AI Studio won't let you send it. Code execution works best with text and CSV files. The input file can be passed in part.inlineData or part.fileData (uploaded via the Files API ), and the output file is always returned as part.inlineData . Single turn Bidirectional (Multimodal Live API) Models supported All Gemini 2.0 and 2.5 models Only Flash experimental models File input types supported .png, .jpeg, .csv, .xml, .cpp, .java, .py, .js, .ts .png, .jpeg, .csv, .xml, .cpp, .java, .py, .js, .ts Plotting libraries supported Matplotlib, seaborn Matplotlib, seaborn Multi-tool use Yes (code execution + grounding only) Yes Billing There's no additional charge for enabling code execution from the Gemini API. You'll be billed at the current rate of input and output tokens based on the Gemini model you're using. Here are a few other things to know about billing for code execution: You're only billed once for the input tokens you pass to the model, and you're billed for the final output tokens returned to you by the model. Tokens representing generated code are counted as output tokens. Generated code can include text and multimodal output like images. Code execution results are also counted as output tokens. The billing model is shown in the following diagram: You're billed at the current rate of input and output tokens based on the Gemini model you're using. If Gemini uses code execution when generating your response, the original prompt, the generated code, and the result of the executed code are labeled intermediate tokens and are billed as input tokens . Gemini then generates a summary and returns the generated code, the result of the executed code, and the final summary. These are billed as output tokens . The Gemini API includes an intermediate token count in the API response, so you know why you're getting additional input tokens beyond your initial prompt. Limitations The model can only generate and execute code. It can't return other artifacts like media files. In some cases, enabling code execution can lead to regressions in other areas of model output (for example, writing a story). There is some variation in the ability of the different models to use code execution successfully. Supported libraries The code execution environment includes the following libraries: attrs chess contourpy fpdf geopandas imageio jinja2 joblib jsonschema jsonschema-specifications lxml matplotlib mpmath numpy opencv-python openpyxl packaging pandas pillow protobuf pylatex pyparsing PyPDF2 python-dateutil python-docx python-pptx reportlab scikit-learn scipy seaborn six striprtf sympy tabulate tensorflow toolz xlrd You can't install your own libraries. Note: Only matplotlib is supported for graph rendering using code execution. What's next Try the code execution Colab . Learn about other Gemini API tools: Function calling Grounding with Google Search Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_document-processing_793c51fa.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/document-processing#main-content
2
  Title: Document understanding  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Document understanding | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Document understanding Gemini models can process documents in PDF format, using native vision to understand entire document contexts. This goes beyond simple text extraction, allowing Gemini to: Analyze and interpret content, including text, images, diagrams, charts, and tables, even in long documents up to 1000 pages. Extract information into structured output formats. Summarize and answer questions based on both the visual and textual elements in a document. Transcribe document content (e.g. to HTML), preserving layouts and formatting, for use in downstream applications. Passing inline PDF data You can pass inline PDF data in the request to generateContent . For PDF payloads under 20MB, you can choose between uploading base64 encoded documents or directly uploading locally stored files. The following example shows you how to fetch a PDF from a URL and convert it to bytes for processing: Python from google import genai from google.genai import types import httpx client = genai . Client () doc_url = "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf" # Retrieve and encode the PDF byte doc_data = httpx . get ( doc_url ) . content prompt = "Summarize this document" response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ types . Part . from_bytes ( data = doc_data , mime_type = 'application/pdf' , ), prompt ]) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({ apiKey : "GEMINI_API_KEY" }); async function main () { const pdfResp = await fetch ( 'https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf' ) . then (( response ) = > response . arrayBuffer ()); const contents = [ { text : "Summarize this document" }, { inlineData : { mimeType : 'application/pdf' , data : Buffer . from ( pdfResp ). toString ( "base64" ) } } ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : contents }); console . log ( response . text ); } main (); Go package main import ( "context" "fmt" "io" "net/http" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , _ := genai . NewClient ( ctx , & genai . ClientConfig { APIKey : os . Getenv ( "GEMINI_API_KEY" ), Backend : genai . BackendGeminiAPI , }) pdfResp , _ := http . Get ( "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf" ) var pdfBytes [] byte if pdfResp != nil && pdfResp . Body != nil { pdfBytes , _ = io . ReadAll ( pdfResp . Body ) pdfResp . Body . Close () } parts := [] * genai . Part { & genai . Part { InlineData : & genai . Blob { MIMEType : "application/pdf" , Data : pdfBytes , }, }, genai . NewPartFromText ( "Summarize this document" ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST DOC_URL = "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf" PROMPT = "Summarize this document" DISPLAY_NAME = "base64_pdf" # Download the PDF wget -O " ${ DISPLAY_NAME } .pdf" " ${ DOC_URL } " # Check for FreeBSD base64 and set flags accordingly if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi # Base64 encode the PDF ENCODED_PDF = $( base64 $B64FLAGS " ${ DISPLAY_NAME } .pdf" ) # Generate content using the base64 encoded PDF curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key= $GOOGLE_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"inline_data": {"mime_type": "application/pdf", "data": "' " $ENCODED_PDF " '"}}, {"text": "' $PROMPT '"} ] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json # Clean up the downloaded PDF rm " ${ DISPLAY_NAME } .pdf" You can also read a PDF from a local file for processing: Python from google import genai from google.genai import types import pathlib client = genai . Client () # Retrieve and encode the PDF byte filepath = pathlib . Path ( 'file.pdf' ) prompt = "Summarize this document" response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ types . Part . from_bytes ( data = filepath . read_bytes (), mime_type = 'application/pdf' , ), prompt ]) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; import * as fs from 'fs' ; const ai = new GoogleGenAI ({ apiKey : "GEMINI_API_KEY" }); async function main () { const contents = [ { text : "Summarize this document" }, { inlineData : { mimeType : 'application/pdf' , data : Buffer . from ( fs . readFileSync ( "content/343019_3_art_0_py4t4l_convrt.pdf" )). toString ( "base64" ) } } ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : contents }); console . log ( response . text ); } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , _ := genai . NewClient ( ctx , & genai . ClientConfig { APIKey : os . Getenv ( "GEMINI_API_KEY" ), Backend : genai . BackendGeminiAPI , }) pdfBytes , _ := os . ReadFile ( "path/to/your/file.pdf" ) parts := [] * genai . Part { & genai . Part { InlineData : & genai . Blob { MIMEType : "application/pdf" , Data : pdfBytes , }, }, genai . NewPartFromText ( "Summarize this document" ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } Uploading PDFs using the File API You can use the File API to upload larger documents. Always use the File API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20MB. Note: The File API lets you store up to 50MB of PDF files. Files are stored for 48 hours. You can access them in that period with your API key, but you can't download them from the API. The File API is available at no cost in all regions where the Gemini API is available. Call media.upload to upload a file using the File API. The following code uploads a document file and then uses the file in a call to models.generateContent . Large PDFs from URLs Use the File API to simplify uploading and processing large PDF files from URLs: Python from google import genai from google.genai import types import io import httpx client = genai . Client () long_context_pdf_path = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf" # Retrieve and upload the PDF using the File API doc_io = io . BytesIO ( httpx . get ( long_context_pdf_path ) . content ) sample_doc = client . files . upload ( # You can pass a path or a file-like object here file = doc_io , config = dict ( mime_type = 'application/pdf' ) ) prompt = "Summarize this document" response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ sample_doc , prompt ]) print ( response . text ) JavaScript import { createPartFromUri , GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({ apiKey : "GEMINI_API_KEY" }); async function main () { const pdfBuffer = await fetch ( "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf" ) . then (( response ) = > response . arrayBuffer ()); const fileBlob = new Blob ([ pdfBuffer ], { type : 'application/pdf' }); const file = await ai . files . upload ({ file : fileBlob , config : { displayName : 'A17_FlightPlan.pdf' , }, }); // Wait for the file to be processed. let getFile = await ai . files . get ({ name : file . name }); while ( getFile . state === 'PROCESSING' ) { getFile = await ai . files . get ({ name : file . name }); console . log ( `current file status: ${ getFile . state } ` ); console . log ( 'File is still processing, retrying in 5 seconds' ); await new Promise (( resolve ) = > { setTimeout ( resolve , 5000 ); }); } if ( file . state === 'FAILED' ) { throw new Error ( 'File processing failed.' ); } // Add the file to the contents. const content = [ 'Summarize this document' , ]; if ( file . uri && file . mimeType ) { const fileContent = createPartFromUri ( file . uri , file . mimeType ); content . push ( fileContent ); } const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : content , }); console . log ( response . text ); } main (); Go package main import ( "context" "fmt" "io" "net/http" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , _ := genai . NewClient ( ctx , & genai . ClientConfig { APIKey : os . Getenv ( "GEMINI_API_KEY" ), Backend : genai . BackendGeminiAPI , }) pdfURL := "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf" localPdfPath := "A17_FlightPlan_downloaded.pdf" respHttp , _ := http . Get ( pdfURL ) defer respHttp . Body . Close () outFile , _ := os . Create ( localPdfPath ) defer outFile . Close () _ , _ = io . Copy ( outFile , respHttp . Body ) uploadConfig := & genai . UploadFileConfig { MIMEType : "application/pdf" } uploadedFile , _ := client . Files . UploadFromPath ( ctx , localPdfPath , uploadConfig ) promptParts := [] * genai . Part { genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), genai . NewPartFromText ( "Summarize this document" ), } contents := [] * genai . Content { genai . NewContentFromParts ( promptParts , genai . RoleUser ), // Specify role } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST PDF_PATH = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf" DISPLAY_NAME = "A17_FlightPlan" PROMPT = "Summarize this document" # Download the PDF from the provided URL wget -O " ${ DISPLAY_NAME } .pdf" " ${ PDF_PATH } " MIME_TYPE = $( file -b --mime-type " ${ DISPLAY_NAME } .pdf" ) NUM_BYTES = $( wc -c < " ${ DISPLAY_NAME } .pdf" ) echo "MIME_TYPE: ${ MIME_TYPE } " echo "NUM_BYTES: ${ NUM_BYTES } " tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl " ${ BASE_URL } /upload/v1beta/files?key= ${ GOOGLE_API_KEY } " \ -D upload-header.tmp \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ DISPLAY_NAME } .pdf" 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) echo "file_uri: ${ file_uri } " # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key= $GOOGLE_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "' $PROMPT '"}, {"file_data":{"mime_type": "application/pdf", "file_uri": ' $file_uri '}}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json # Clean up the downloaded PDF rm " ${ DISPLAY_NAME } .pdf" Large PDFs stored locally Python from google import genai from google.genai import types import pathlib import httpx client = genai . Client () # Retrieve and encode the PDF byte file_path = pathlib . Path ( 'large_file.pdf' ) # Upload the PDF using the File API sample_file = client . files . upload ( file = file_path , ) prompt = "Summarize this document" response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ sample_file , "Summarize this document" ]) print ( response . text ) JavaScript import { createPartFromUri , GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({ apiKey : "GEMINI_API_KEY" }); async function main () { const file = await ai . files . upload ({ file : 'path-to-localfile.pdf' config : { displayName : 'A17_FlightPlan.pdf' , }, }); // Wait for the file to be processed. let getFile = await ai . files . get ({ name : file . name }); while ( getFile . state === 'PROCESSING' ) { getFile = await ai . files . get ({ name : file . name }); console . log ( `current file status: ${ getFile . state } ` ); console . log ( 'File is still processing, retrying in 5 seconds' ); await new Promise (( resolve ) = > { setTimeout ( resolve , 5000 ); }); } if ( file . state === 'FAILED' ) { throw new Error ( 'File processing failed.' ); } // Add the file to the contents. const content = [ 'Summarize this document' , ]; if ( file . uri && file . mimeType ) { const fileContent = createPartFromUri ( file . uri , file . mimeType ); content . push ( fileContent ); } const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : content , }); console . log ( response . text ); } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , _ := genai . NewClient ( ctx , & genai . ClientConfig { APIKey : os . Getenv ( "GEMINI_API_KEY" ), Backend : genai . BackendGeminiAPI , }) localPdfPath := "/path/to/file.pdf" uploadConfig := & genai . UploadFileConfig { MIMEType : "application/pdf" } uploadedFile , _ := client . Files . UploadFromPath ( ctx , localPdfPath , uploadConfig ) promptParts := [] * genai . Part { genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), genai . NewPartFromText ( "Give me a summary of this pdf file." ), } contents := [] * genai . Content { genai . NewContentFromParts ( promptParts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST NUM_BYTES = $( wc -c < " ${ PDF_PATH } " ) DISPLAY_NAME = TEXT tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl " ${ BASE_URL } /upload/v1beta/files?key= ${ GEMINI_API_KEY } " \ -D upload-header.tmp \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: application/pdf" \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ PDF_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key= $GOOGLE_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "Can you add a few more lines to this poem?"}, {"file_data":{"mime_type": "application/pdf", "file_uri": ' $file_uri '}}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json You can verify the API successfully stored the uploaded file and get its metadata by calling files.get . Only the name (and by extension, the uri ) are unique. Python from google import genai import pathlib client = genai . Client () fpath = pathlib . Path ( 'example.txt' ) fpath . write_text ( 'hello' ) file = client . files . upload ( file = 'example.txt' ) file_info = client . files . get ( name = file . name ) print ( file_info . model_dump_json ( indent = 4 )) REST name = $( jq ".file.name" file_info.json ) # Get the file of interest to check state curl https://generativelanguage.googleapis.com/v1beta/files/ $name > file_info.json # Print some information about the file you got name = $( jq ".file.name" file_info.json ) echo name = $name file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri Passing multiple PDFs The Gemini API is capable of processing multiple PDF documents (up to 1000 pages) in a single request, as long as the combined size of the documents and the text prompt stays within the model's context window. Python from google import genai import io import httpx client = genai . Client () doc_url_1 = "https://arxiv.org/pdf/2312.11805" doc_url_2 = "https://arxiv.org/pdf/2403.05530" # Retrieve and upload both PDFs using the File API doc_data_1 = io . BytesIO ( httpx . get ( doc_url_1 ) . content ) doc_data_2 = io . BytesIO ( httpx . get ( doc_url_2 ) . content ) sample_pdf_1 = client . files . upload ( file = doc_data_1 , config = dict ( mime_type = 'application/pdf' ) ) sample_pdf_2 = client . files . upload ( file = doc_data_2 , config = dict ( mime_type = 'application/pdf' ) ) prompt = "What is the difference between each of the main benchmarks between these two papers? Output these in a table." response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ sample_pdf_1 , sample_pdf_2 , prompt ]) print ( response . text ) JavaScript import { createPartFromUri , GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({ apiKey : "GEMINI_API_KEY" }); async function uploadRemotePDF ( url , displayName ) { const pdfBuffer = await fetch ( url ) . then (( response ) = > response . arrayBuffer ()); const fileBlob = new Blob ([ pdfBuffer ], { type : 'application/pdf' }); const file = await ai . files . upload ({ file : fileBlob , config : { displayName : displayName , }, }); // Wait for the file to be processed. let getFile = await ai . files . get ({ name : file . name }); while ( getFile . state === 'PROCESSING' ) { getFile = await ai . files . get ({ name : file . name }); console . log ( `current file status: ${ getFile . state } ` ); console . log ( 'File is still processing, retrying in 5 seconds' ); await new Promise (( resolve ) = > { setTimeout ( resolve , 5000 ); }); } if ( file . state === 'FAILED' ) { throw new Error ( 'File processing failed.' ); } return file ; } async function main () { const content = [ 'What is the difference between each of the main benchmarks between these two papers? Output these in a table.' , ]; let file1 = await uploadRemotePDF ( "https://arxiv.org/pdf/2312.11805" , "PDF 1" ) if ( file1 . uri && file1 . mimeType ) { const fileContent = createPartFromUri ( file1 . uri , file1 . mimeType ); content . push ( fileContent ); } let file2 = await uploadRemotePDF ( "https://arxiv.org/pdf/2403.05530" , "PDF 2" ) if ( file2 . uri && file2 . mimeType ) { const fileContent = createPartFromUri ( file2 . uri , file2 . mimeType ); content . push ( fileContent ); } const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : content , }); console . log ( response . text ); } main (); Go package main import ( "context" "fmt" "io" "net/http" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , _ := genai . NewClient ( ctx , & genai . ClientConfig { APIKey : os . Getenv ( "GEMINI_API_KEY" ), Backend : genai . BackendGeminiAPI , }) docUrl1 := "https://arxiv.org/pdf/2312.11805" docUrl2 := "https://arxiv.org/pdf/2403.05530" localPath1 := "doc1_downloaded.pdf" localPath2 := "doc2_downloaded.pdf" respHttp1 , _ := http . Get ( docUrl1 ) defer respHttp1 . Body . Close () outFile1 , _ := os . Create ( localPath1 ) _ , _ = io . Copy ( outFile1 , respHttp1 . Body ) outFile1 . Close () respHttp2 , _ := http . Get ( docUrl2 ) defer respHttp2 . Body . Close () outFile2 , _ := os . Create ( localPath2 ) _ , _ = io . Copy ( outFile2 , respHttp2 . Body ) outFile2 . Close () uploadConfig1 := & genai . UploadFileConfig { MIMEType : "application/pdf" } uploadedFile1 , _ := client . Files . UploadFromPath ( ctx , localPath1 , uploadConfig1 ) uploadConfig2 := & genai . UploadFileConfig { MIMEType : "application/pdf" } uploadedFile2 , _ := client . Files . UploadFromPath ( ctx , localPath2 , uploadConfig2 ) promptParts := [] * genai . Part { genai . NewPartFromURI ( uploadedFile1 . URI , uploadedFile1 . MIMEType ), genai . NewPartFromURI ( uploadedFile2 . URI , uploadedFile2 . MIMEType ), genai . NewPartFromText ( "What is the difference between each of the " + "main benchmarks between these two papers? " + "Output these in a table." ), } contents := [] * genai . Content { genai . NewContentFromParts ( promptParts , genai . RoleUser ), } modelName := "gemini-2.5-flash" result , _ := client . Models . GenerateContent ( ctx , modelName , contents , nil , ) fmt . Println ( result . Text ()) } REST DOC_URL_1 = "https://arxiv.org/pdf/2312.11805" DOC_URL_2 = "https://arxiv.org/pdf/2403.05530" DISPLAY_NAME_1 = "Gemini_paper" DISPLAY_NAME_2 = "Gemini_1.5_paper" PROMPT = "What is the difference between each of the main benchmarks between these two papers? Output these in a table." # Function to download and upload a PDF upload_pdf () { local doc_url = " $1 " local display_name = " $2 " # Download the PDF wget -O " ${ display_name } .pdf" " ${ doc_url } " local MIME_TYPE = $( file -b --mime-type " ${ display_name } .pdf" ) local NUM_BYTES = $( wc -c < " ${ display_name } .pdf" ) echo "MIME_TYPE: ${ MIME_TYPE } " echo "NUM_BYTES: ${ NUM_BYTES } " local tmp_header_file = upload-header.tmp # Initial resumable request curl " ${ BASE_URL } /upload/v1beta/files?key= ${ GOOGLE_API_KEY } " \ -D " ${ tmp_header_file } " \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ display_name } '}}" 2 > /dev/null local upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the PDF curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ display_name } .pdf" 2 > /dev/null > "file_info_ ${ display_name } .json" local file_uri = $( jq ".file.uri" "file_info_ ${ display_name } .json" ) echo "file_uri for ${ display_name } : ${ file_uri } " # Clean up the downloaded PDF rm " ${ display_name } .pdf" echo " ${ file_uri } " } # Upload the first PDF file_uri_1 = $( upload_pdf " ${ DOC_URL_1 } " " ${ DISPLAY_NAME_1 } " ) # Upload the second PDF file_uri_2 = $( upload_pdf " ${ DOC_URL_2 } " " ${ DISPLAY_NAME_2 } " ) # Now generate content using both files curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key= $GOOGLE_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"file_data": {"mime_type": "application/pdf", "file_uri": ' $file_uri_1 '}}, {"file_data": {"mime_type": "application/pdf", "file_uri": ' $file_uri_2 '}}, {"text": "' $PROMPT '"} ] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Technical details Gemini supports a maximum of 1,000 document pages. Each document page is equivalent to 258 tokens. While there are no specific limits to the number of pixels in a document besides the model's context window , larger pages are scaled down to a maximum resolution of 3072x3072 while preserving their original aspect ratio, while smaller pages are scaled up to 768x768 pixels. There is no cost reduction for pages at lower sizes, other than bandwidth, or performance improvement for pages at higher resolution. Document types Technically, you can pass other MIME types for document understanding, like TXT, Markdown, HTML, XML, etc. However, document vision only meaningfully understands PDFs . Other types will be extracted as pure text, and the model won't be able to interpret what we see in the rendering of those files. Any file-type specifics like charts, diagrams, HTML tags, Markdown formatting, etc., will be lost. Best practices For best results: Rotate pages to the correct orientation before uploading. Avoid blurry pages. If using a single page, place the text prompt after the page. What's next To learn more, see the following resources: File prompting strategies : The Gemini API supports prompting with text, image, audio, and video data, also known as multimodal prompting. System instructions : System instructions let you steer the behavior of the model based on your specific needs and use cases. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-09 UTC.
 
2
  Title: Document understanding  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Document understanding | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Document understanding Gemini models can process documents in PDF format, using native vision to understand entire document contexts. This goes beyond simple text extraction, allowing Gemini to: Analyze and interpret content, including text, images, diagrams, charts, and tables, even in long documents up to 1000 pages. Extract information into structured output formats. Summarize and answer questions based on both the visual and textual elements in a document. Transcribe document content (e.g. to HTML), preserving layouts and formatting, for use in downstream applications. Passing inline PDF data You can pass inline PDF data in the request to generateContent . For PDF payloads under 20MB, you can choose between uploading base64 encoded documents or directly uploading locally stored files. The following example shows you how to fetch a PDF from a URL and convert it to bytes for processing: Python from google import genai from google.genai import types import httpx client = genai . Client () doc_url = "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf" # Retrieve and encode the PDF byte doc_data = httpx . get ( doc_url ) . content prompt = "Summarize this document" response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ types . Part . from_bytes ( data = doc_data , mime_type = 'application/pdf' , ), prompt ]) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({ apiKey : "GEMINI_API_KEY" }); async function main () { const pdfResp = await fetch ( 'https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf' ) . then (( response ) = > response . arrayBuffer ()); const contents = [ { text : "Summarize this document" }, { inlineData : { mimeType : 'application/pdf' , data : Buffer . from ( pdfResp ). toString ( "base64" ) } } ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : contents }); console . log ( response . text ); } main (); Go package main import ( "context" "fmt" "io" "net/http" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , _ := genai . NewClient ( ctx , & genai . ClientConfig { APIKey : os . Getenv ( "GEMINI_API_KEY" ), Backend : genai . BackendGeminiAPI , }) pdfResp , _ := http . Get ( "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf" ) var pdfBytes [] byte if pdfResp != nil && pdfResp . Body != nil { pdfBytes , _ = io . ReadAll ( pdfResp . Body ) pdfResp . Body . Close () } parts := [] * genai . Part { & genai . Part { InlineData : & genai . Blob { MIMEType : "application/pdf" , Data : pdfBytes , }, }, genai . NewPartFromText ( "Summarize this document" ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST DOC_URL = "https://discovery.ucl.ac.uk/id/eprint/10089234/1/343019_3_art_0_py4t4l_convrt.pdf" PROMPT = "Summarize this document" DISPLAY_NAME = "base64_pdf" # Download the PDF wget -O " ${ DISPLAY_NAME } .pdf" " ${ DOC_URL } " # Check for FreeBSD base64 and set flags accordingly if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi # Base64 encode the PDF ENCODED_PDF = $( base64 $B64FLAGS " ${ DISPLAY_NAME } .pdf" ) # Generate content using the base64 encoded PDF curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key= $GOOGLE_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"inline_data": {"mime_type": "application/pdf", "data": "' " $ENCODED_PDF " '"}}, {"text": "' $PROMPT '"} ] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json # Clean up the downloaded PDF rm " ${ DISPLAY_NAME } .pdf" You can also read a PDF from a local file for processing: Python from google import genai from google.genai import types import pathlib client = genai . Client () # Retrieve and encode the PDF byte filepath = pathlib . Path ( 'file.pdf' ) prompt = "Summarize this document" response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ types . Part . from_bytes ( data = filepath . read_bytes (), mime_type = 'application/pdf' , ), prompt ]) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; import * as fs from 'fs' ; const ai = new GoogleGenAI ({ apiKey : "GEMINI_API_KEY" }); async function main () { const contents = [ { text : "Summarize this document" }, { inlineData : { mimeType : 'application/pdf' , data : Buffer . from ( fs . readFileSync ( "content/343019_3_art_0_py4t4l_convrt.pdf" )). toString ( "base64" ) } } ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : contents }); console . log ( response . text ); } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , _ := genai . NewClient ( ctx , & genai . ClientConfig { APIKey : os . Getenv ( "GEMINI_API_KEY" ), Backend : genai . BackendGeminiAPI , }) pdfBytes , _ := os . ReadFile ( "path/to/your/file.pdf" ) parts := [] * genai . Part { & genai . Part { InlineData : & genai . Blob { MIMEType : "application/pdf" , Data : pdfBytes , }, }, genai . NewPartFromText ( "Summarize this document" ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } Uploading PDFs using the File API You can use the File API to upload larger documents. Always use the File API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20MB. Note: The File API lets you store up to 50MB of PDF files. Files are stored for 48 hours. You can access them in that period with your API key, but you can't download them from the API. The File API is available at no cost in all regions where the Gemini API is available. Call media.upload to upload a file using the File API. The following code uploads a document file and then uses the file in a call to models.generateContent . Large PDFs from URLs Use the File API to simplify uploading and processing large PDF files from URLs: Python from google import genai from google.genai import types import io import httpx client = genai . Client () long_context_pdf_path = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf" # Retrieve and upload the PDF using the File API doc_io = io . BytesIO ( httpx . get ( long_context_pdf_path ) . content ) sample_doc = client . files . upload ( # You can pass a path or a file-like object here file = doc_io , config = dict ( mime_type = 'application/pdf' ) ) prompt = "Summarize this document" response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ sample_doc , prompt ]) print ( response . text ) JavaScript import { createPartFromUri , GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({ apiKey : "GEMINI_API_KEY" }); async function main () { const pdfBuffer = await fetch ( "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf" ) . then (( response ) = > response . arrayBuffer ()); const fileBlob = new Blob ([ pdfBuffer ], { type : 'application/pdf' }); const file = await ai . files . upload ({ file : fileBlob , config : { displayName : 'A17_FlightPlan.pdf' , }, }); // Wait for the file to be processed. let getFile = await ai . files . get ({ name : file . name }); while ( getFile . state === 'PROCESSING' ) { getFile = await ai . files . get ({ name : file . name }); console . log ( `current file status: ${ getFile . state } ` ); console . log ( 'File is still processing, retrying in 5 seconds' ); await new Promise (( resolve ) = > { setTimeout ( resolve , 5000 ); }); } if ( file . state === 'FAILED' ) { throw new Error ( 'File processing failed.' ); } // Add the file to the contents. const content = [ 'Summarize this document' , ]; if ( file . uri && file . mimeType ) { const fileContent = createPartFromUri ( file . uri , file . mimeType ); content . push ( fileContent ); } const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : content , }); console . log ( response . text ); } main (); Go package main import ( "context" "fmt" "io" "net/http" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , _ := genai . NewClient ( ctx , & genai . ClientConfig { APIKey : os . Getenv ( "GEMINI_API_KEY" ), Backend : genai . BackendGeminiAPI , }) pdfURL := "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf" localPdfPath := "A17_FlightPlan_downloaded.pdf" respHttp , _ := http . Get ( pdfURL ) defer respHttp . Body . Close () outFile , _ := os . Create ( localPdfPath ) defer outFile . Close () _ , _ = io . Copy ( outFile , respHttp . Body ) uploadConfig := & genai . UploadFileConfig { MIMEType : "application/pdf" } uploadedFile , _ := client . Files . UploadFromPath ( ctx , localPdfPath , uploadConfig ) promptParts := [] * genai . Part { genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), genai . NewPartFromText ( "Summarize this document" ), } contents := [] * genai . Content { genai . NewContentFromParts ( promptParts , genai . RoleUser ), // Specify role } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST PDF_PATH = "https://www.nasa.gov/wp-content/uploads/static/history/alsj/a17/A17_FlightPlan.pdf" DISPLAY_NAME = "A17_FlightPlan" PROMPT = "Summarize this document" # Download the PDF from the provided URL wget -O " ${ DISPLAY_NAME } .pdf" " ${ PDF_PATH } " MIME_TYPE = $( file -b --mime-type " ${ DISPLAY_NAME } .pdf" ) NUM_BYTES = $( wc -c < " ${ DISPLAY_NAME } .pdf" ) echo "MIME_TYPE: ${ MIME_TYPE } " echo "NUM_BYTES: ${ NUM_BYTES } " tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl " ${ BASE_URL } /upload/v1beta/files?key= ${ GOOGLE_API_KEY } " \ -D upload-header.tmp \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ DISPLAY_NAME } .pdf" 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) echo "file_uri: ${ file_uri } " # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key= $GOOGLE_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "' $PROMPT '"}, {"file_data":{"mime_type": "application/pdf", "file_uri": ' $file_uri '}}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json # Clean up the downloaded PDF rm " ${ DISPLAY_NAME } .pdf" Large PDFs stored locally Python from google import genai from google.genai import types import pathlib import httpx client = genai . Client () # Retrieve and encode the PDF byte file_path = pathlib . Path ( 'large_file.pdf' ) # Upload the PDF using the File API sample_file = client . files . upload ( file = file_path , ) prompt = "Summarize this document" response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ sample_file , "Summarize this document" ]) print ( response . text ) JavaScript import { createPartFromUri , GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({ apiKey : "GEMINI_API_KEY" }); async function main () { const file = await ai . files . upload ({ file : 'path-to-localfile.pdf' config : { displayName : 'A17_FlightPlan.pdf' , }, }); // Wait for the file to be processed. let getFile = await ai . files . get ({ name : file . name }); while ( getFile . state === 'PROCESSING' ) { getFile = await ai . files . get ({ name : file . name }); console . log ( `current file status: ${ getFile . state } ` ); console . log ( 'File is still processing, retrying in 5 seconds' ); await new Promise (( resolve ) = > { setTimeout ( resolve , 5000 ); }); } if ( file . state === 'FAILED' ) { throw new Error ( 'File processing failed.' ); } // Add the file to the contents. const content = [ 'Summarize this document' , ]; if ( file . uri && file . mimeType ) { const fileContent = createPartFromUri ( file . uri , file . mimeType ); content . push ( fileContent ); } const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : content , }); console . log ( response . text ); } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , _ := genai . NewClient ( ctx , & genai . ClientConfig { APIKey : os . Getenv ( "GEMINI_API_KEY" ), Backend : genai . BackendGeminiAPI , }) localPdfPath := "/path/to/file.pdf" uploadConfig := & genai . UploadFileConfig { MIMEType : "application/pdf" } uploadedFile , _ := client . Files . UploadFromPath ( ctx , localPdfPath , uploadConfig ) promptParts := [] * genai . Part { genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), genai . NewPartFromText ( "Give me a summary of this pdf file." ), } contents := [] * genai . Content { genai . NewContentFromParts ( promptParts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST NUM_BYTES = $( wc -c < " ${ PDF_PATH } " ) DISPLAY_NAME = TEXT tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl " ${ BASE_URL } /upload/v1beta/files?key= ${ GEMINI_API_KEY } " \ -D upload-header.tmp \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: application/pdf" \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ PDF_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key= $GOOGLE_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "Can you add a few more lines to this poem?"}, {"file_data":{"mime_type": "application/pdf", "file_uri": ' $file_uri '}}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json You can verify the API successfully stored the uploaded file and get its metadata by calling files.get . Only the name (and by extension, the uri ) are unique. Python from google import genai import pathlib client = genai . Client () fpath = pathlib . Path ( 'example.txt' ) fpath . write_text ( 'hello' ) file = client . files . upload ( file = 'example.txt' ) file_info = client . files . get ( name = file . name ) print ( file_info . model_dump_json ( indent = 4 )) REST name = $( jq ".file.name" file_info.json ) # Get the file of interest to check state curl https://generativelanguage.googleapis.com/v1beta/files/ $name > file_info.json # Print some information about the file you got name = $( jq ".file.name" file_info.json ) echo name = $name file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri Passing multiple PDFs The Gemini API is capable of processing multiple PDF documents (up to 1000 pages) in a single request, as long as the combined size of the documents and the text prompt stays within the model's context window. Python from google import genai import io import httpx client = genai . Client () doc_url_1 = "https://arxiv.org/pdf/2312.11805" doc_url_2 = "https://arxiv.org/pdf/2403.05530" # Retrieve and upload both PDFs using the File API doc_data_1 = io . BytesIO ( httpx . get ( doc_url_1 ) . content ) doc_data_2 = io . BytesIO ( httpx . get ( doc_url_2 ) . content ) sample_pdf_1 = client . files . upload ( file = doc_data_1 , config = dict ( mime_type = 'application/pdf' ) ) sample_pdf_2 = client . files . upload ( file = doc_data_2 , config = dict ( mime_type = 'application/pdf' ) ) prompt = "What is the difference between each of the main benchmarks between these two papers? Output these in a table." response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ sample_pdf_1 , sample_pdf_2 , prompt ]) print ( response . text ) JavaScript import { createPartFromUri , GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({ apiKey : "GEMINI_API_KEY" }); async function uploadRemotePDF ( url , displayName ) { const pdfBuffer = await fetch ( url ) . then (( response ) = > response . arrayBuffer ()); const fileBlob = new Blob ([ pdfBuffer ], { type : 'application/pdf' }); const file = await ai . files . upload ({ file : fileBlob , config : { displayName : displayName , }, }); // Wait for the file to be processed. let getFile = await ai . files . get ({ name : file . name }); while ( getFile . state === 'PROCESSING' ) { getFile = await ai . files . get ({ name : file . name }); console . log ( `current file status: ${ getFile . state } ` ); console . log ( 'File is still processing, retrying in 5 seconds' ); await new Promise (( resolve ) = > { setTimeout ( resolve , 5000 ); }); } if ( file . state === 'FAILED' ) { throw new Error ( 'File processing failed.' ); } return file ; } async function main () { const content = [ 'What is the difference between each of the main benchmarks between these two papers? Output these in a table.' , ]; let file1 = await uploadRemotePDF ( "https://arxiv.org/pdf/2312.11805" , "PDF 1" ) if ( file1 . uri && file1 . mimeType ) { const fileContent = createPartFromUri ( file1 . uri , file1 . mimeType ); content . push ( fileContent ); } let file2 = await uploadRemotePDF ( "https://arxiv.org/pdf/2403.05530" , "PDF 2" ) if ( file2 . uri && file2 . mimeType ) { const fileContent = createPartFromUri ( file2 . uri , file2 . mimeType ); content . push ( fileContent ); } const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : content , }); console . log ( response . text ); } main (); Go package main import ( "context" "fmt" "io" "net/http" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , _ := genai . NewClient ( ctx , & genai . ClientConfig { APIKey : os . Getenv ( "GEMINI_API_KEY" ), Backend : genai . BackendGeminiAPI , }) docUrl1 := "https://arxiv.org/pdf/2312.11805" docUrl2 := "https://arxiv.org/pdf/2403.05530" localPath1 := "doc1_downloaded.pdf" localPath2 := "doc2_downloaded.pdf" respHttp1 , _ := http . Get ( docUrl1 ) defer respHttp1 . Body . Close () outFile1 , _ := os . Create ( localPath1 ) _ , _ = io . Copy ( outFile1 , respHttp1 . Body ) outFile1 . Close () respHttp2 , _ := http . Get ( docUrl2 ) defer respHttp2 . Body . Close () outFile2 , _ := os . Create ( localPath2 ) _ , _ = io . Copy ( outFile2 , respHttp2 . Body ) outFile2 . Close () uploadConfig1 := & genai . UploadFileConfig { MIMEType : "application/pdf" } uploadedFile1 , _ := client . Files . UploadFromPath ( ctx , localPath1 , uploadConfig1 ) uploadConfig2 := & genai . UploadFileConfig { MIMEType : "application/pdf" } uploadedFile2 , _ := client . Files . UploadFromPath ( ctx , localPath2 , uploadConfig2 ) promptParts := [] * genai . Part { genai . NewPartFromURI ( uploadedFile1 . URI , uploadedFile1 . MIMEType ), genai . NewPartFromURI ( uploadedFile2 . URI , uploadedFile2 . MIMEType ), genai . NewPartFromText ( "What is the difference between each of the " + "main benchmarks between these two papers? " + "Output these in a table." ), } contents := [] * genai . Content { genai . NewContentFromParts ( promptParts , genai . RoleUser ), } modelName := "gemini-2.5-flash" result , _ := client . Models . GenerateContent ( ctx , modelName , contents , nil , ) fmt . Println ( result . Text ()) } REST DOC_URL_1 = "https://arxiv.org/pdf/2312.11805" DOC_URL_2 = "https://arxiv.org/pdf/2403.05530" DISPLAY_NAME_1 = "Gemini_paper" DISPLAY_NAME_2 = "Gemini_1.5_paper" PROMPT = "What is the difference between each of the main benchmarks between these two papers? Output these in a table." # Function to download and upload a PDF upload_pdf () { local doc_url = " $1 " local display_name = " $2 " # Download the PDF wget -O " ${ display_name } .pdf" " ${ doc_url } " local MIME_TYPE = $( file -b --mime-type " ${ display_name } .pdf" ) local NUM_BYTES = $( wc -c < " ${ display_name } .pdf" ) echo "MIME_TYPE: ${ MIME_TYPE } " echo "NUM_BYTES: ${ NUM_BYTES } " local tmp_header_file = upload-header.tmp # Initial resumable request curl " ${ BASE_URL } /upload/v1beta/files?key= ${ GOOGLE_API_KEY } " \ -D " ${ tmp_header_file } " \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ display_name } '}}" 2 > /dev/null local upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the PDF curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ display_name } .pdf" 2 > /dev/null > "file_info_ ${ display_name } .json" local file_uri = $( jq ".file.uri" "file_info_ ${ display_name } .json" ) echo "file_uri for ${ display_name } : ${ file_uri } " # Clean up the downloaded PDF rm " ${ display_name } .pdf" echo " ${ file_uri } " } # Upload the first PDF file_uri_1 = $( upload_pdf " ${ DOC_URL_1 } " " ${ DISPLAY_NAME_1 } " ) # Upload the second PDF file_uri_2 = $( upload_pdf " ${ DOC_URL_2 } " " ${ DISPLAY_NAME_2 } " ) # Now generate content using both files curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key= $GOOGLE_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"file_data": {"mime_type": "application/pdf", "file_uri": ' $file_uri_1 '}}, {"file_data": {"mime_type": "application/pdf", "file_uri": ' $file_uri_2 '}}, {"text": "' $PROMPT '"} ] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Technical details Gemini supports a maximum of 1,000 document pages. Each document page is equivalent to 258 tokens. While there are no specific limits to the number of pixels in a document besides the model's context window , larger pages are scaled down to a maximum resolution of 3072x3072 while preserving their original aspect ratio, while smaller pages are scaled up to 768x768 pixels. There is no cost reduction for pages at lower sizes, other than bandwidth, or performance improvement for pages at higher resolution. Document types Technically, you can pass other MIME types for document understanding, like TXT, Markdown, HTML, XML, etc. However, document vision only meaningfully understands PDFs . Other types will be extracted as pure text, and the model won't be able to interpret what we see in the rendering of those files. Any file-type specifics like charts, diagrams, HTML tags, Markdown formatting, etc., will be lost. Best practices For best results: Rotate pages to the correct orientation before uploading. Avoid blurry pages. If using a single page, place the text prompt after the page. What's next To learn more, see the following resources: File prompting strategies : The Gemini API supports prompting with text, image, audio, and video data, also known as multimodal prompting. System instructions : System instructions let you steer the behavior of the model based on your specific needs and use cases. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_downloads_a1919083.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/downloads#main-content
2
  Title: Gemini API libraries  |  Google AI for Developers
3
  ==================================================
4
 
5
- Gemini API libraries | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Gemini API libraries When building with the Gemini API, we recommend using our official collection of libraries across major languages: the Google GenAI SDK . They are production ready under General Availability . Our samples and documentation across this site are built using these libraries. Note: If you're using one of our legacy libraries, we strongly recommend you migrate to the Google GenAI SDK. Review the legacy libraries section for more information. If you're new to the Gemini API, follow our quickstart guide to get started. Language support and installation The Google GenAI SDK is available for the Python, JavaScript/TypeScript, Go and Java languages. You can install each language's library using package managers, or visit their GitHub repos for further engagement: Python Library: google-genai GitHub Repository: googleapis/python-genai Installation: pip install google-genai JavaScript Library: @google/genai GitHub Repository: googleapis/js-genai Installation: npm install @google/genai Go Library: google.golang.org/genai GitHub Repository: googleapis/go-genai Installation: go get google.golang.org/genai Java Library: google-genai GitHub Repository: googleapis/java-genai Installation: If you're using Maven, add the following to your dependencies: <dependencies> <dependency> <groupId>com.google.genai</groupId> <artifactId>google-genai</artifactId> <version>1.0.0</version> </dependency> </dependencies> General availability We started rolling out the Google GenAI SDK in late 2024. As of May 2025, it reached General Availability (GA) across all supported platforms. This means the libraries are stable and fully supported for production use. They are actively maintained, provide access to the latest features, and offer the best performance working with Gemini. If you're not using the Google GenAI SDK and using one of our legacy libraries, we strongly recommend you to migrate. Review the legacy libraries section for more information. Legacy libraries and migration If you are using one of our legacy libraries, we recommend that you migrate to the new libraries . The legacy libraries don't provide access to recent features (such as Live API and Veo ) and are on a deprecation path. They will stop receiving updates at the end of September 2025, the feature gaps will grow and potential bugs may no longer get fixed. Each legacy library's support status varies, detailed in the following table: Language Legacy library Support status Recommended library Python google-generativeai All support, including bug fixes, ends end of September 2025. google-genai JavaScript/TypeScript @google/generativeai All support, including bug fixes, ends end of September 2025. @google/genai Go google.golang.org/generative-ai All support, including bug fixes, ends end of September 2025. google.golang.org/genai Dart and Flutter google_generative_ai Not actively maintained Use trusted community or third party libraries, like firebase_ai , or access using REST API Swift generative-ai-swift Not actively maintained Use Gemini in Firebase Android generative-ai-android Not actively maintained Use Gemini in Firebase Note for Java developers: There was no legacy Google-provided Java SDK for the Gemini API, so no migration from a previous Google library is required. You can start directly with the new library in the Language support and installation section. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-09 UTC.
 
2
  Title: Gemini API libraries  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Gemini API libraries | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Gemini API libraries When building with the Gemini API, we recommend using the Google GenAI SDK . These are the official, production-ready libraries that we develop and maintain for the most popular languages. They are in General Availability and used in all our official documentation and examples. Note: If you're using one of our legacy libraries, we strongly recommend you migrate to the Google GenAI SDK. Review the legacy libraries section for more information. If you're new to the Gemini API, follow our quickstart guide to get started. Language support and installation The Google GenAI SDK is available for the Python, JavaScript/TypeScript, Go and Java languages. You can install each language's library using package managers, or visit their GitHub repos for further engagement: Python Library: google-genai GitHub Repository: googleapis/python-genai Installation: pip install google-genai JavaScript Library: @google/genai GitHub Repository: googleapis/js-genai Installation: npm install @google/genai Go Library: google.golang.org/genai GitHub Repository: googleapis/go-genai Installation: go get google.golang.org/genai Java Library: google-genai GitHub Repository: googleapis/java-genai Installation: If you're using Maven, add the following to your dependencies: <dependencies> <dependency> <groupId>com.google.genai</groupId> <artifactId>google-genai</artifactId> <version>1.0.0</version> </dependency> </dependencies> General availability We started rolling out Google GenAI SDK, a new set of libraries to access Gemini API, in late 2024 when we launched Gemini 2.0. As of May 2025, they reached General Availability (GA) across all supported platforms and are the recommended libraries to access the Gemini API. They are stable, fully supported for production use, and are actively maintained. They provide access to the latest features, and offer the best performance working with Gemini. If you're using one of our legacy libraries, we strongly recommend you migrate so that you can access the latest features and get the best performance working with Gemini. Review the legacy libraries section for more information. Legacy libraries and migration If you are using one of our legacy libraries, we recommend that you migrate to the new libraries . The legacy libraries don't provide access to recent features (such as Live API and Veo ) and are on a deprecation path. They will stop receiving updates on November 30th, 2025, the feature gaps will grow and potential bugs may no longer get fixed. Each legacy library's support status varies, detailed in the following table: Language Legacy library Support status Recommended library Python google-generativeai All support, including bug fixes, ends on November 30th, 2025. google-genai JavaScript/TypeScript @google/generativeai All support, including bug fixes, ends on November 30th, 2025. @google/genai Go google.golang.org/generative-ai All support, including bug fixes, ends on November 30th, 2025. google.golang.org/genai Dart and Flutter google_generative_ai Not actively maintained Use trusted community or third party libraries, like firebase_ai , or access using REST API Swift generative-ai-swift Not actively maintained Use Firebase AI Logic Android generative-ai-android Not actively maintained Use Firebase AI Logic Note for Java developers: There was no legacy Google-provided Java SDK for the Gemini API, so no migration from a previous Google library is required. You can start directly with the new library in the Language support and installation section. Prompt templates for code generation Generative models (e.g., Gemini, Claude) and AI-powered IDEs (e.g., Cursor) may produce code for the Gemini API using outdated or deprecated libraries due to their training data cutoff. For the generated code to use the latest, recommended libraries, provide version and usage guidance directly in your prompts. You can use the templates below to provide the necessary context: Python JavaScript/TypeScript Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_downloads_a8fc2042.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/downloads
2
  Title: Gemini API libraries  |  Google AI for Developers
3
  ==================================================
4
 
5
- Gemini API libraries | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Gemini API libraries When building with the Gemini API, we recommend using our official collection of libraries across major languages: the Google GenAI SDK . They are production ready under General Availability . Our samples and documentation across this site are built using these libraries. Note: If you're using one of our legacy libraries, we strongly recommend you migrate to the Google GenAI SDK. Review the legacy libraries section for more information. If you're new to the Gemini API, follow our quickstart guide to get started. Language support and installation The Google GenAI SDK is available for the Python, JavaScript/TypeScript, Go and Java languages. You can install each language's library using package managers, or visit their GitHub repos for further engagement: Python Library: google-genai GitHub Repository: googleapis/python-genai Installation: pip install google-genai JavaScript Library: @google/genai GitHub Repository: googleapis/js-genai Installation: npm install @google/genai Go Library: google.golang.org/genai GitHub Repository: googleapis/go-genai Installation: go get google.golang.org/genai Java Library: google-genai GitHub Repository: googleapis/java-genai Installation: If you're using Maven, add the following to your dependencies: <dependencies> <dependency> <groupId>com.google.genai</groupId> <artifactId>google-genai</artifactId> <version>1.0.0</version> </dependency> </dependencies> General availability We started rolling out the Google GenAI SDK in late 2024. As of May 2025, it reached General Availability (GA) across all supported platforms. This means the libraries are stable and fully supported for production use. They are actively maintained, provide access to the latest features, and offer the best performance working with Gemini. If you're not using the Google GenAI SDK and using one of our legacy libraries, we strongly recommend you to migrate. Review the legacy libraries section for more information. Legacy libraries and migration If you are using one of our legacy libraries, we recommend that you migrate to the new libraries . The legacy libraries don't provide access to recent features (such as Live API and Veo ) and are on a deprecation path. They will stop receiving updates at the end of September 2025, the feature gaps will grow and potential bugs may no longer get fixed. Each legacy library's support status varies, detailed in the following table: Language Legacy library Support status Recommended library Python google-generativeai All support, including bug fixes, ends end of September 2025. google-genai JavaScript/TypeScript @google/generativeai All support, including bug fixes, ends end of September 2025. @google/genai Go google.golang.org/generative-ai All support, including bug fixes, ends end of September 2025. google.golang.org/genai Dart and Flutter google_generative_ai Not actively maintained Use trusted community or third party libraries, like firebase_ai , or access using REST API Swift generative-ai-swift Not actively maintained Use Gemini in Firebase Android generative-ai-android Not actively maintained Use Gemini in Firebase Note for Java developers: There was no legacy Google-provided Java SDK for the Gemini API, so no migration from a previous Google library is required. You can start directly with the new library in the Language support and installation section. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-09 UTC.
 
2
  Title: Gemini API libraries  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Gemini API libraries | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Gemini API libraries When building with the Gemini API, we recommend using the Google GenAI SDK . These are the official, production-ready libraries that we develop and maintain for the most popular languages. They are in General Availability and used in all our official documentation and examples. Note: If you're using one of our legacy libraries, we strongly recommend you migrate to the Google GenAI SDK. Review the legacy libraries section for more information. If you're new to the Gemini API, follow our quickstart guide to get started. Language support and installation The Google GenAI SDK is available for the Python, JavaScript/TypeScript, Go and Java languages. You can install each language's library using package managers, or visit their GitHub repos for further engagement: Python Library: google-genai GitHub Repository: googleapis/python-genai Installation: pip install google-genai JavaScript Library: @google/genai GitHub Repository: googleapis/js-genai Installation: npm install @google/genai Go Library: google.golang.org/genai GitHub Repository: googleapis/go-genai Installation: go get google.golang.org/genai Java Library: google-genai GitHub Repository: googleapis/java-genai Installation: If you're using Maven, add the following to your dependencies: <dependencies> <dependency> <groupId>com.google.genai</groupId> <artifactId>google-genai</artifactId> <version>1.0.0</version> </dependency> </dependencies> General availability We started rolling out Google GenAI SDK, a new set of libraries to access Gemini API, in late 2024 when we launched Gemini 2.0. As of May 2025, they reached General Availability (GA) across all supported platforms and are the recommended libraries to access the Gemini API. They are stable, fully supported for production use, and are actively maintained. They provide access to the latest features, and offer the best performance working with Gemini. If you're using one of our legacy libraries, we strongly recommend you migrate so that you can access the latest features and get the best performance working with Gemini. Review the legacy libraries section for more information. Legacy libraries and migration If you are using one of our legacy libraries, we recommend that you migrate to the new libraries . The legacy libraries don't provide access to recent features (such as Live API and Veo ) and are on a deprecation path. They will stop receiving updates on November 30th, 2025, the feature gaps will grow and potential bugs may no longer get fixed. Each legacy library's support status varies, detailed in the following table: Language Legacy library Support status Recommended library Python google-generativeai All support, including bug fixes, ends on November 30th, 2025. google-genai JavaScript/TypeScript @google/generativeai All support, including bug fixes, ends on November 30th, 2025. @google/genai Go google.golang.org/generative-ai All support, including bug fixes, ends on November 30th, 2025. google.golang.org/genai Dart and Flutter google_generative_ai Not actively maintained Use trusted community or third party libraries, like firebase_ai , or access using REST API Swift generative-ai-swift Not actively maintained Use Firebase AI Logic Android generative-ai-android Not actively maintained Use Firebase AI Logic Note for Java developers: There was no legacy Google-provided Java SDK for the Gemini API, so no migration from a previous Google library is required. You can start directly with the new library in the Language support and installation section. Prompt templates for code generation Generative models (e.g., Gemini, Claude) and AI-powered IDEs (e.g., Cursor) may produce code for the Gemini API using outdated or deprecated libraries due to their training data cutoff. For the generated code to use the latest, recommended libraries, provide version and usage guidance directly in your prompts. You can use the templates below to provide the necessary context: Python JavaScript/TypeScript Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_embeddings_2bb4dd63.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ URL: https://ai.google.dev/gemini-api/docs/embeddings#model-versions
2
+ Title: Embeddings  |  Gemini API  |  Google AI for Developers
3
+ ==================================================
4
+
5
+ Embeddings | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Embeddings The Gemini API offers text embedding models to generate embeddings for words, phrases, sentences, and code. These foundational embeddings power advanced NLP tasks such as semantic search, classification, and clustering, providing more accurate, context-aware results than keyword-based approaches. Building Retrieval Augmented Generation (RAG) systems is a common use case for embeddings. Embeddings play a key role in significantly enhancing model outputs with improved factual accuracy, coherence, and contextual richness. They efficiently retrieve relevant information from knowledge bases, represented by embeddings, which are then passed as additional context in the input prompt to language models, guiding it to generate more informed and accurate responses. To learn more about the available embedding model variants, see the Model versions section. For higher throughput serving at half the price, try Batch API Embedding . Generating embeddings Use the embedContent method to generate text embeddings: Python from google import genai client = genai . Client () result = client . models . embed_content ( model = "gemini-embedding-001" , contents = "What is the meaning of life?" ) print ( result . embeddings ) JavaScript import { GoogleGenAI } from "@google/genai" ; async function main () { const ai = new GoogleGenAI ({}); const response = await ai . models . embedContent ({ model : 'gemini-embedding-001' , contents : 'What is the meaning of life?' , }); console . log ( response . embeddings ); } main (); Go package main import ( "context" "encoding/json" "fmt" "log" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } contents := [] * genai . Content { genai . NewContentFromText ( "What is the meaning of life?" , genai . RoleUser ), } result , err := client . Models . EmbedContent ( ctx , "gemini-embedding-001" , contents , nil , ) if err != nil { log . Fatal ( err ) } embeddings , err := json . MarshalIndent ( result . Embeddings , "" , " " ) if err != nil { log . Fatal ( err ) } fmt . Println ( string ( embeddings )) } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d '{"model": "models/gemini-embedding-001", "content": {"parts":[{"text": "What is the meaning of life?"}]} }' You can also generate embeddings for multiple chunks at once by passing them in as a list of strings. Python from google import genai client = genai . Client () result = client . models . embed_content ( model = "gemini-embedding-001" , contents = [ "What is the meaning of life?" , "What is the purpose of existence?" , "How do I bake a cake?" ]) for embedding in result . embeddings : print ( embedding ) JavaScript import { GoogleGenAI } from "@google/genai" ; async function main () { const ai = new GoogleGenAI ({}); const response = await ai . models . embedContent ({ model : 'gemini-embedding-001' , contents : [ 'What is the meaning of life?' , 'What is the purpose of existence?' , 'How do I bake a cake?' ], }); console . log ( response . embeddings ); } main (); Go package main import ( "context" "encoding/json" "fmt" "log" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } contents := [] * genai . Content { genai . NewContentFromText ( "What is the meaning of life?" ), genai . NewContentFromText ( "How does photosynthesis work?" ), genai . NewContentFromText ( "Tell me about the history of the internet." ), } result , err := client . Models . EmbedContent ( ctx , "gemini-embedding-001" , contents , nil , ) if err != nil { log . Fatal ( err ) } embeddings , err := json . MarshalIndent ( result . Embeddings , "" , " " ) if err != nil { log . Fatal ( err ) } fmt . Println ( string ( embeddings )) } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d '{"model": "models/gemini-embedding-001", "content": [ {"parts": [{"text": "What is the meaning of life?"}]}, {"parts": [{"text": "What is the purpose of existence?"}]}, {"parts": [{"text": "How do I bake a cake?"}]} ] }' Specify task type to improve performance You can use embeddings for a wide range of tasks from classification to document search. Specifying the right task type helps optimize the embeddings for the intended relationships, maximizing accuracy and efficiency. For a complete list of supported task types, see the Supported task types table. The following example shows how you can use SEMANTIC_SIMILARITY to check how similar in meaning strings of texts are. Note: Cosine similarity is a good distance metric because it focuses on direction rather than magnitude, which more accurately reflects conceptual closeness. Values range from -1 (opposite) to 1 (greatest similarity). Python from google import genai from google.genai import types import numpy as np from sklearn.metrics.pairwise import cosine_similarity client = genai . Client () texts = [ "What is the meaning of life?" , "What is the purpose of existence?" , "How do I bake a cake?" ] result = [ np . array ( e . values ) for e in client . models . embed_content ( model = "gemini-embedding-001" , contents = texts , config = types . EmbedContentConfig ( task_type = "SEMANTIC_SIMILARITY" )) . embeddings ] # Calculate cosine similarity. Higher scores = greater semantic similarity. embeddings_matrix = np . array ( result ) similarity_matrix = cosine_similarity ( embeddings_matrix ) for i , text1 in enumerate ( texts ): for j in range ( i + 1 , len ( texts )): text2 = texts [ j ] similarity = similarity_matrix [ i , j ] print ( f "Similarity between ' { text1 } ' and ' { text2 } ': { similarity : .4f } " ) JavaScript import { GoogleGenAI } from "@google/genai" ; import * as cosineSimilarity from "compute-cosine-similarity" ; async function main () { const ai = new GoogleGenAI ({}); const texts = [ "What is the meaning of life?" , "What is the purpose of existence?" , "How do I bake a cake?" , ]; const response = await ai . models . embedContent ({ model : 'gemini-embedding-001' , contents : texts , taskType : 'SEMANTIC_SIMILARITY' }); const embeddings = response . embeddings . map ( e = > e . values ); for ( let i = 0 ; i < texts . length ; i ++ ) { for ( let j = i + 1 ; j < texts . length ; j ++ ) { const text1 = texts [ i ]; const text2 = texts [ j ]; const similarity = cosineSimilarity ( embeddings [ i ], embeddings [ j ]); console . log ( `Similarity between ' ${ text1 } ' and ' ${ text2 } ': ${ similarity . toFixed ( 4 ) } ` ); } } } main (); Go package main import ( "context" "fmt" "log" "math" "google.golang.org/genai" ) // cosineSimilarity calculates the similarity between two vectors. func cosineSimilarity ( a , b [] float32 ) ( float64 , error ) { if len ( a ) != len ( b ) { return 0 , fmt . Errorf ( "vectors must have the same length" ) } var dotProduct , aMagnitude , bMagnitude float64 for i := 0 ; i < len ( a ); i ++ { dotProduct += float64 ( a [ i ] * b [ i ]) aMagnitude += float64 ( a [ i ] * a [ i ]) bMagnitude += float64 ( b [ i ] * b [ i ]) } if aMagnitude == 0 || bMagnitude == 0 { return 0 , nil } return dotProduct / ( math . Sqrt ( aMagnitude ) * math . Sqrt ( bMagnitude )), nil } func main () { ctx := context . Background () client , _ := genai . NewClient ( ctx , nil ) defer client . Close () texts := [] string { "What is the meaning of life?" , "What is the purpose of existence?" , "How do I bake a cake?" , } var contents [] * genai . Content for _ , text := range texts { contents = append ( contents , genai . NewContentFromText ( text , genai . RoleUser )) } result , _ := client . Models . EmbedContent ( ctx , "gemini-embedding-001" , contents , & genai . EmbedContentRequest { TaskType : genai . TaskTypeSemanticSimilarity }, ) embeddings := result . Embeddings for i := 0 ; i < len ( texts ); i ++ { for j := i + 1 ; j < len ( texts ); j ++ { similarity , _ := cosineSimilarity ( embeddings [ i ]. Values , embeddings [ j ]. Values ) fmt . Printf ( "Similarity between '%s' and '%s': %.4f\n" , texts [ i ], texts [ j ], similarity ) } } } REST curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d '{ "contents": [ {"parts": [{"text": "What is the meaning of life?"}]}, {"parts": [{"text": "What is the purpose of existence?"}]}, {"parts": [{"text": "How do I bake a cake?"}]} ], "embedding_config": { "task_type": "SEMANTIC_SIMILARITY" } }' The following shows an example output from this code snippet: Similarity between 'What is the meaning of life?' and 'What is the purpose of existence?': 0.9481 Similarity between 'What is the meaning of life?' and 'How do I bake a cake?': 0.7471 Similarity between 'What is the purpose of existence?' and 'How do I bake a cake?': 0.7371 Supported task types Task type Description Examples SEMANTIC_SIMILARITY Embeddings optimized to assess text similarity. Recommendation systems, duplicate detection CLASSIFICATION Embeddings optimized to classify texts according to preset labels. Sentiment analysis, spam detection CLUSTERING Embeddings optimized to cluster texts based on their similarities. Document organization, market research, anomaly detection RETRIEVAL_DOCUMENT Embeddings optimized for document search. Indexing articles, books, or web pages for search. RETRIEVAL_QUERY Embeddings optimized for general search queries. Use RETRIEVAL_QUERY for queries; RETRIEVAL_DOCUMENT for documents to be retrieved. Custom search CODE_RETRIEVAL_QUERY Embeddings optimized for retrieval of code blocks based on natural language queries. Use CODE_RETRIEVAL_QUERY for queries; RETRIEVAL_DOCUMENT for code blocks to be retrieved. Code suggestions and search QUESTION_ANSWERING Embeddings for questions in a question-answering system, optimized for finding documents that answer the question. Use QUESTION_ANSWERING for questions; RETRIEVAL_DOCUMENT for documents to be retrieved. Chatbox FACT_VERIFICATION Embeddings for statements that need to be verified, optimized for retrieving documents that contain evidence supporting or refuting the statement. Use FACT_VERIFICATION for the target text; RETRIEVAL_DOCUMENT for documents to be retrieved Automated fact-checking systems Controlling embedding size The Gemini embedding model, gemini-embedding-001 , is trained using the Matryoshka Representation Learning (MRL) technique which teaches a model to learn high-dimensional embeddings that have initial segments (or prefixes) which are also useful, simpler versions of the same data. Use the output_dimensionality parameter to control the size of the output embedding vector. Selecting a smaller output dimensionality can save storage space and increase computational efficiency for downstream applications, while sacrificing little in terms of quality. By default, it outputs a 3072-dimensional embedding, but you can truncate it to a smaller size without losing quality to save storage space. We recommend using 768, 1536, or 3072 output dimensions. Python from google import genai from google.genai import types client = genai . Client () result = client . models . embed_content ( model = "gemini-embedding-001" , contents = "What is the meaning of life?" , config = types . EmbedContentConfig ( output_dimensionality = 768 ) ) [ embedding_obj ] = result . embeddings embedding_length = len ( embedding_obj . values ) print ( f "Length of embedding: { embedding_length } " ) JavaScript import { GoogleGenAI } from "@google/genai" ; async function main () { const ai = new GoogleGenAI ({}); const response = await ai . models . embedContent ({ model : 'gemini-embedding-001' , content : 'What is the meaning of life?' , outputDimensionality : 768 , }); const embeddingLength = response . embedding . values . length ; console . log ( `Length of embedding: ${ embeddingLength } ` ); } main (); Go package main import ( "context" "fmt" "log" "google.golang.org/genai" ) func main () { ctx := context . Background () // The client uses Application Default Credentials. // Authenticate with 'gcloud auth application-default login'. client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } defer client . Close () contents := [] * genai . Content { genai . NewContentFromText ( "What is the meaning of life?" , genai . RoleUser ), } result , err := client . Models . EmbedContent ( ctx , "gemini-embedding-001" , contents , & genai . EmbedContentRequest { OutputDimensionality : 768 }, ) if err != nil { log . Fatal ( err ) } embedding := result . Embeddings [ 0 ] embeddingLength := len ( embedding . Values ) fmt . Printf ( "Length of embedding: %d\n" , embeddingLength ) } REST curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent" \ -H "x-goog-api-key: YOUR_GEMINI_API_KEY" \ -H 'Content-Type: application/json' \ -d '{ "contents": [ {"parts": [{"text": "What is the meaning of life?"}]} ], "embedding_config": { "output_dimensionality": 768 } }' Example output from the code snippet: Length of embedding: 768 Ensuring quality for smaller dimensions The 3072 dimension embedding is normalized. Normalized embeddings produce more accurate semantic similarity by comparing vector direction, not magnitude. For other dimensions, including 768 and 1536, you need to normalize the embeddings as follows: Python import numpy as np from numpy.linalg import norm embedding_values_np = np . array ( embedding_obj . values ) normed_embedding = embedding_values_np / np . linalg . norm ( embedding_values_np ) print ( f "Normed embedding length: { len ( normed_embedding ) } " ) print ( f "Norm of normed embedding: { np . linalg . norm ( normed_embedding ) : .6f } " ) # Should be very close to 1 Example output from this code snippet: Normed embedding length: 768 Norm of normed embedding: 1.000000 The following table shows the MTEB scores, a commonly used benchmark for embeddings, for different dimensions. Notably, the result shows that performance is not strictly tied to the size of the embedding dimension, with lower dimensions achieving scores comparable to their higher dimension counterparts. MRL Dimension MTEB Score 2048 68.16 1536 68.17 768 67.99 512 67.55 256 66.19 128 63.31 Use cases Text embeddings are crucial for a variety of common AI use cases, such as: Retrieval-Augmented Generation (RAG): Embeddings enhance the quality of generated text by retrieving and incorporating relevant information into the context of a model. Information retrieval: Search for the most semantically similar text or documents given a piece of input text. Document search tutorial task Search reranking : Prioritize the most relevant items by semantically scoring initial results against the query. Search reranking tutorial task Anomaly detection: Comparing groups of embeddings can help identify hidden trends or outliers. Anomaly detection tutorial bubble_chart Classification: Automatically categorize text based on its content, such as sentiment analysis or spam detection Classification tutorial token Clustering: Effectively grasp complex relationships by creating clusters and visualizations of your embeddings. Clustering visualization tutorial bubble_chart Storing embeddings As you take embeddings to production, it is common to use vector databases to efficiently store, index, and retrieve high-dimensional embeddings. Google Cloud offers managed data services that can be used for this purpose including BigQuery , AlloyDB , and Cloud SQL . The following tutorials show how to use other third party vector databases with Gemini Embedding. ChromaDB tutorials bolt QDrant tutorials bolt Weaviate tutorials bolt Pinecone tutorials bolt Model versions Property Description id_card Model code Gemini API gemini-embedding-001 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 2,048 Output dimension size Flexible, supports: 128 - 3072, Recommended: 768, 1536, 3072 123 Versions Read the model version patterns for more details. Stable: gemini-embedding-001 Experimental: gemini-embedding-exp-03-07 calendar_month Latest update June 2025 Batch embeddings If latency is not a concern, try using the Gemini Embeddings model with Batch API . This allows for much higher throughput at 50% of interactive Embedding pricing. Find examples on how to get started in the Batch API cookbook . Responsible use notice Unlike generative AI models that create new content, the Gemini Embedding model is only intended to transform the format of your input data into a numerical representation. While Google is responsible for providing an embedding model that transforms the format of your input data to the numerical-format requested, users retain full responsibility for the data they input and the resulting embeddings. By using the Gemini Embedding model you confirm that you have the necessary rights to any content that you upload. Do not generate content that infringes on others' intellectual property or privacy rights. Your use of this service is subject to our Prohibited Use Policy and Google's Terms of Service . Start building with embeddings Check out the embeddings quickstart notebook to explore the model capabilities and learn how to customize and visualize your embeddings. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-09-10 UTC.
text_content/docs_embeddings_d8be8095.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ URL: https://ai.google.dev/gemini-api/docs/embeddings#supported-task-types
2
+ Title: Embeddings  |  Gemini API  |  Google AI for Developers
3
+ ==================================================
4
+
5
+ Embeddings | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Embeddings The Gemini API offers text embedding models to generate embeddings for words, phrases, sentences, and code. These foundational embeddings power advanced NLP tasks such as semantic search, classification, and clustering, providing more accurate, context-aware results than keyword-based approaches. Building Retrieval Augmented Generation (RAG) systems is a common use case for embeddings. Embeddings play a key role in significantly enhancing model outputs with improved factual accuracy, coherence, and contextual richness. They efficiently retrieve relevant information from knowledge bases, represented by embeddings, which are then passed as additional context in the input prompt to language models, guiding it to generate more informed and accurate responses. To learn more about the available embedding model variants, see the Model versions section. For higher throughput serving at half the price, try Batch API Embedding . Generating embeddings Use the embedContent method to generate text embeddings: Python from google import genai client = genai . Client () result = client . models . embed_content ( model = "gemini-embedding-001" , contents = "What is the meaning of life?" ) print ( result . embeddings ) JavaScript import { GoogleGenAI } from "@google/genai" ; async function main () { const ai = new GoogleGenAI ({}); const response = await ai . models . embedContent ({ model : 'gemini-embedding-001' , contents : 'What is the meaning of life?' , }); console . log ( response . embeddings ); } main (); Go package main import ( "context" "encoding/json" "fmt" "log" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } contents := [] * genai . Content { genai . NewContentFromText ( "What is the meaning of life?" , genai . RoleUser ), } result , err := client . Models . EmbedContent ( ctx , "gemini-embedding-001" , contents , nil , ) if err != nil { log . Fatal ( err ) } embeddings , err := json . MarshalIndent ( result . Embeddings , "" , " " ) if err != nil { log . Fatal ( err ) } fmt . Println ( string ( embeddings )) } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d '{"model": "models/gemini-embedding-001", "content": {"parts":[{"text": "What is the meaning of life?"}]} }' You can also generate embeddings for multiple chunks at once by passing them in as a list of strings. Python from google import genai client = genai . Client () result = client . models . embed_content ( model = "gemini-embedding-001" , contents = [ "What is the meaning of life?" , "What is the purpose of existence?" , "How do I bake a cake?" ]) for embedding in result . embeddings : print ( embedding ) JavaScript import { GoogleGenAI } from "@google/genai" ; async function main () { const ai = new GoogleGenAI ({}); const response = await ai . models . embedContent ({ model : 'gemini-embedding-001' , contents : [ 'What is the meaning of life?' , 'What is the purpose of existence?' , 'How do I bake a cake?' ], }); console . log ( response . embeddings ); } main (); Go package main import ( "context" "encoding/json" "fmt" "log" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } contents := [] * genai . Content { genai . NewContentFromText ( "What is the meaning of life?" ), genai . NewContentFromText ( "How does photosynthesis work?" ), genai . NewContentFromText ( "Tell me about the history of the internet." ), } result , err := client . Models . EmbedContent ( ctx , "gemini-embedding-001" , contents , nil , ) if err != nil { log . Fatal ( err ) } embeddings , err := json . MarshalIndent ( result . Embeddings , "" , " " ) if err != nil { log . Fatal ( err ) } fmt . Println ( string ( embeddings )) } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d '{"model": "models/gemini-embedding-001", "content": [ {"parts": [{"text": "What is the meaning of life?"}]}, {"parts": [{"text": "What is the purpose of existence?"}]}, {"parts": [{"text": "How do I bake a cake?"}]} ] }' Specify task type to improve performance You can use embeddings for a wide range of tasks from classification to document search. Specifying the right task type helps optimize the embeddings for the intended relationships, maximizing accuracy and efficiency. For a complete list of supported task types, see the Supported task types table. The following example shows how you can use SEMANTIC_SIMILARITY to check how similar in meaning strings of texts are. Note: Cosine similarity is a good distance metric because it focuses on direction rather than magnitude, which more accurately reflects conceptual closeness. Values range from -1 (opposite) to 1 (greatest similarity). Python from google import genai from google.genai import types import numpy as np from sklearn.metrics.pairwise import cosine_similarity client = genai . Client () texts = [ "What is the meaning of life?" , "What is the purpose of existence?" , "How do I bake a cake?" ] result = [ np . array ( e . values ) for e in client . models . embed_content ( model = "gemini-embedding-001" , contents = texts , config = types . EmbedContentConfig ( task_type = "SEMANTIC_SIMILARITY" )) . embeddings ] # Calculate cosine similarity. Higher scores = greater semantic similarity. embeddings_matrix = np . array ( result ) similarity_matrix = cosine_similarity ( embeddings_matrix ) for i , text1 in enumerate ( texts ): for j in range ( i + 1 , len ( texts )): text2 = texts [ j ] similarity = similarity_matrix [ i , j ] print ( f "Similarity between ' { text1 } ' and ' { text2 } ': { similarity : .4f } " ) JavaScript import { GoogleGenAI } from "@google/genai" ; import * as cosineSimilarity from "compute-cosine-similarity" ; async function main () { const ai = new GoogleGenAI ({}); const texts = [ "What is the meaning of life?" , "What is the purpose of existence?" , "How do I bake a cake?" , ]; const response = await ai . models . embedContent ({ model : 'gemini-embedding-001' , contents : texts , taskType : 'SEMANTIC_SIMILARITY' }); const embeddings = response . embeddings . map ( e = > e . values ); for ( let i = 0 ; i < texts . length ; i ++ ) { for ( let j = i + 1 ; j < texts . length ; j ++ ) { const text1 = texts [ i ]; const text2 = texts [ j ]; const similarity = cosineSimilarity ( embeddings [ i ], embeddings [ j ]); console . log ( `Similarity between ' ${ text1 } ' and ' ${ text2 } ': ${ similarity . toFixed ( 4 ) } ` ); } } } main (); Go package main import ( "context" "fmt" "log" "math" "google.golang.org/genai" ) // cosineSimilarity calculates the similarity between two vectors. func cosineSimilarity ( a , b [] float32 ) ( float64 , error ) { if len ( a ) != len ( b ) { return 0 , fmt . Errorf ( "vectors must have the same length" ) } var dotProduct , aMagnitude , bMagnitude float64 for i := 0 ; i < len ( a ); i ++ { dotProduct += float64 ( a [ i ] * b [ i ]) aMagnitude += float64 ( a [ i ] * a [ i ]) bMagnitude += float64 ( b [ i ] * b [ i ]) } if aMagnitude == 0 || bMagnitude == 0 { return 0 , nil } return dotProduct / ( math . Sqrt ( aMagnitude ) * math . Sqrt ( bMagnitude )), nil } func main () { ctx := context . Background () client , _ := genai . NewClient ( ctx , nil ) defer client . Close () texts := [] string { "What is the meaning of life?" , "What is the purpose of existence?" , "How do I bake a cake?" , } var contents [] * genai . Content for _ , text := range texts { contents = append ( contents , genai . NewContentFromText ( text , genai . RoleUser )) } result , _ := client . Models . EmbedContent ( ctx , "gemini-embedding-001" , contents , & genai . EmbedContentRequest { TaskType : genai . TaskTypeSemanticSimilarity }, ) embeddings := result . Embeddings for i := 0 ; i < len ( texts ); i ++ { for j := i + 1 ; j < len ( texts ); j ++ { similarity , _ := cosineSimilarity ( embeddings [ i ]. Values , embeddings [ j ]. Values ) fmt . Printf ( "Similarity between '%s' and '%s': %.4f\n" , texts [ i ], texts [ j ], similarity ) } } } REST curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d '{ "contents": [ {"parts": [{"text": "What is the meaning of life?"}]}, {"parts": [{"text": "What is the purpose of existence?"}]}, {"parts": [{"text": "How do I bake a cake?"}]} ], "embedding_config": { "task_type": "SEMANTIC_SIMILARITY" } }' The following shows an example output from this code snippet: Similarity between 'What is the meaning of life?' and 'What is the purpose of existence?': 0.9481 Similarity between 'What is the meaning of life?' and 'How do I bake a cake?': 0.7471 Similarity between 'What is the purpose of existence?' and 'How do I bake a cake?': 0.7371 Supported task types Task type Description Examples SEMANTIC_SIMILARITY Embeddings optimized to assess text similarity. Recommendation systems, duplicate detection CLASSIFICATION Embeddings optimized to classify texts according to preset labels. Sentiment analysis, spam detection CLUSTERING Embeddings optimized to cluster texts based on their similarities. Document organization, market research, anomaly detection RETRIEVAL_DOCUMENT Embeddings optimized for document search. Indexing articles, books, or web pages for search. RETRIEVAL_QUERY Embeddings optimized for general search queries. Use RETRIEVAL_QUERY for queries; RETRIEVAL_DOCUMENT for documents to be retrieved. Custom search CODE_RETRIEVAL_QUERY Embeddings optimized for retrieval of code blocks based on natural language queries. Use CODE_RETRIEVAL_QUERY for queries; RETRIEVAL_DOCUMENT for code blocks to be retrieved. Code suggestions and search QUESTION_ANSWERING Embeddings for questions in a question-answering system, optimized for finding documents that answer the question. Use QUESTION_ANSWERING for questions; RETRIEVAL_DOCUMENT for documents to be retrieved. Chatbox FACT_VERIFICATION Embeddings for statements that need to be verified, optimized for retrieving documents that contain evidence supporting or refuting the statement. Use FACT_VERIFICATION for the target text; RETRIEVAL_DOCUMENT for documents to be retrieved Automated fact-checking systems Controlling embedding size The Gemini embedding model, gemini-embedding-001 , is trained using the Matryoshka Representation Learning (MRL) technique which teaches a model to learn high-dimensional embeddings that have initial segments (or prefixes) which are also useful, simpler versions of the same data. Use the output_dimensionality parameter to control the size of the output embedding vector. Selecting a smaller output dimensionality can save storage space and increase computational efficiency for downstream applications, while sacrificing little in terms of quality. By default, it outputs a 3072-dimensional embedding, but you can truncate it to a smaller size without losing quality to save storage space. We recommend using 768, 1536, or 3072 output dimensions. Python from google import genai from google.genai import types client = genai . Client () result = client . models . embed_content ( model = "gemini-embedding-001" , contents = "What is the meaning of life?" , config = types . EmbedContentConfig ( output_dimensionality = 768 ) ) [ embedding_obj ] = result . embeddings embedding_length = len ( embedding_obj . values ) print ( f "Length of embedding: { embedding_length } " ) JavaScript import { GoogleGenAI } from "@google/genai" ; async function main () { const ai = new GoogleGenAI ({}); const response = await ai . models . embedContent ({ model : 'gemini-embedding-001' , content : 'What is the meaning of life?' , outputDimensionality : 768 , }); const embeddingLength = response . embedding . values . length ; console . log ( `Length of embedding: ${ embeddingLength } ` ); } main (); Go package main import ( "context" "fmt" "log" "google.golang.org/genai" ) func main () { ctx := context . Background () // The client uses Application Default Credentials. // Authenticate with 'gcloud auth application-default login'. client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } defer client . Close () contents := [] * genai . Content { genai . NewContentFromText ( "What is the meaning of life?" , genai . RoleUser ), } result , err := client . Models . EmbedContent ( ctx , "gemini-embedding-001" , contents , & genai . EmbedContentRequest { OutputDimensionality : 768 }, ) if err != nil { log . Fatal ( err ) } embedding := result . Embeddings [ 0 ] embeddingLength := len ( embedding . Values ) fmt . Printf ( "Length of embedding: %d\n" , embeddingLength ) } REST curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-001:embedContent" \ -H "x-goog-api-key: YOUR_GEMINI_API_KEY" \ -H 'Content-Type: application/json' \ -d '{ "contents": [ {"parts": [{"text": "What is the meaning of life?"}]} ], "embedding_config": { "output_dimensionality": 768 } }' Example output from the code snippet: Length of embedding: 768 Ensuring quality for smaller dimensions The 3072 dimension embedding is normalized. Normalized embeddings produce more accurate semantic similarity by comparing vector direction, not magnitude. For other dimensions, including 768 and 1536, you need to normalize the embeddings as follows: Python import numpy as np from numpy.linalg import norm embedding_values_np = np . array ( embedding_obj . values ) normed_embedding = embedding_values_np / np . linalg . norm ( embedding_values_np ) print ( f "Normed embedding length: { len ( normed_embedding ) } " ) print ( f "Norm of normed embedding: { np . linalg . norm ( normed_embedding ) : .6f } " ) # Should be very close to 1 Example output from this code snippet: Normed embedding length: 768 Norm of normed embedding: 1.000000 The following table shows the MTEB scores, a commonly used benchmark for embeddings, for different dimensions. Notably, the result shows that performance is not strictly tied to the size of the embedding dimension, with lower dimensions achieving scores comparable to their higher dimension counterparts. MRL Dimension MTEB Score 2048 68.16 1536 68.17 768 67.99 512 67.55 256 66.19 128 63.31 Use cases Text embeddings are crucial for a variety of common AI use cases, such as: Retrieval-Augmented Generation (RAG): Embeddings enhance the quality of generated text by retrieving and incorporating relevant information into the context of a model. Information retrieval: Search for the most semantically similar text or documents given a piece of input text. Document search tutorial task Search reranking : Prioritize the most relevant items by semantically scoring initial results against the query. Search reranking tutorial task Anomaly detection: Comparing groups of embeddings can help identify hidden trends or outliers. Anomaly detection tutorial bubble_chart Classification: Automatically categorize text based on its content, such as sentiment analysis or spam detection Classification tutorial token Clustering: Effectively grasp complex relationships by creating clusters and visualizations of your embeddings. Clustering visualization tutorial bubble_chart Storing embeddings As you take embeddings to production, it is common to use vector databases to efficiently store, index, and retrieve high-dimensional embeddings. Google Cloud offers managed data services that can be used for this purpose including BigQuery , AlloyDB , and Cloud SQL . The following tutorials show how to use other third party vector databases with Gemini Embedding. ChromaDB tutorials bolt QDrant tutorials bolt Weaviate tutorials bolt Pinecone tutorials bolt Model versions Property Description id_card Model code Gemini API gemini-embedding-001 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 2,048 Output dimension size Flexible, supports: 128 - 3072, Recommended: 768, 1536, 3072 123 Versions Read the model version patterns for more details. Stable: gemini-embedding-001 Experimental: gemini-embedding-exp-03-07 calendar_month Latest update June 2025 Batch embeddings If latency is not a concern, try using the Gemini Embeddings model with Batch API . This allows for much higher throughput at 50% of interactive Embedding pricing. Find examples on how to get started in the Batch API cookbook . Responsible use notice Unlike generative AI models that create new content, the Gemini Embedding model is only intended to transform the format of your input data into a numerical representation. While Google is responsible for providing an embedding model that transforms the format of your input data to the numerical-format requested, users retain full responsibility for the data they input and the resulting embeddings. By using the Gemini Embedding model you confirm that you have the necessary rights to any content that you upload. Do not generate content that infringes on others' intellectual property or privacy rights. Your use of this service is subject to our Prohibited Use Policy and Google's Terms of Service . Start building with embeddings Check out the embeddings quickstart notebook to explore the model capabilities and learn how to customize and visualize your embeddings. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-09-10 UTC.
text_content/docs_files_1488114d.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/files
2
  Title: Files API  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Files API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Files API The Gemini family of artificial intelligence (AI) models is built to handle various types of input data, including text, images, and audio. Since these models can handle more than one type or mode of data, the Gemini models are called multimodal models or explained as having multimodal capabilities . This guide shows you how to work with media files using the Files API. The basic operations are the same for audio files, images, videos, documents, and other supported file types. For file prompting guidance, check out the File prompt guide section. Upload a file You can use the Files API to upload a media file. Always use the Files API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20 MB. The following code uploads a file and then uses the file in a call to generateContent . Python from google import genai client = genai . Client () myfile = client . files . upload ( file = "path/to/sample.mp3" ) response = client . models . generate_content ( model = "gemini-2.0-flash" , contents = [ "Describe this audio clip" , myfile ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); async function main () { const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const response = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Describe this audio clip" , ]), }); console . log ( response . text ); } await main (); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } defer client . DeleteFile ( ctx , file . Name ) model := client . GenerativeModel ( "gemini-2.0-flash" ) resp , err := model . GenerateContent ( ctx , genai . FileData { URI : file . URI }, genai . Text ( "Describe this audio clip" )) if err != nil { log . Fatal ( err ) } printResponse ( resp ) REST AUDIO_PATH = "path/to/sample.mp3" MIME_TYPE = $( file -b --mime-type " ${ AUDIO_PATH } " ) NUM_BYTES = $( wc -c < " ${ AUDIO_PATH } " ) DISPLAY_NAME = AUDIO tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl " ${ BASE_URL } /upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D " ${ tmp_header_file } " \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ AUDIO_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "Describe this audio clip"}, {"file_data":{"mime_type": "${MIME_TYPE}", "file_uri": ' $file_uri '}}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Get metadata for a file You can verify that the API successfully stored the uploaded file and get its metadata by calling files.get . Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) file_name = myfile . name myfile = client . files . get ( name = file_name ) print ( myfile ) JavaScript const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const fileName = myfile . name ; const fetchedFile = await ai . files . get ({ name : fileName }); console . log ( fetchedFile ); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } gotFile , err := client . GetFile ( ctx , file . Name ) if err != nil { log . Fatal ( err ) } fmt . Println ( "Got file:" , gotFile . Name ) REST # file_info.json was created in the upload example name = $( jq ".file.name" file_info.json ) # Get the file of interest to check state curl https://generativelanguage.googleapis.com/v1beta/files/ $name \ -H "x-goog-api-key: $GEMINI_API_KEY " > file_info.json # Print some information about the file you got name = $( jq ".file.name" file_info.json ) echo name = $name file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri List uploaded files You can upload multiple files using the Files API. The following code gets a list of all the files uploaded: Python print ( 'My files:' ) for f in client . files . list (): print ( ' ' , f . name ) JavaScript const listResponse = await ai . files . list ({ config : { pageSize : 10 } }); for await ( const file of listResponse ) { console . log ( file . name ); } Go iter := client . ListFiles ( ctx ) for { ifile , err := iter . Next () if err == iterator . Done { break } if err != nil { log . Fatal ( err ) } fmt . Println ( ifile . Name ) } REST echo "My files: " curl "https://generativelanguage.googleapis.com/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " Delete uploaded files Files are automatically deleted after 48 hours. You can also manually delete an uploaded file: Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) client . files . delete ( name = myfile . name ) JavaScript const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const fileName = myfile . name ; await ai . files . delete ({ name : fileName }); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } client . DeleteFile ( ctx , file . Name ) REST curl --request "DELETE" https://generativelanguage.googleapis.com/v1beta/files/ $name \ -H "x-goog-api-key: $GEMINI_API_KEY " Usage info You can use the Files API to upload and interact with media files. The Files API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB. Files are stored for 48 hours. During that time, you can use the API to get metadata about the files, but you can't download the files. The Files API is available at no cost in all regions where the Gemini API is available. File prompting strategies This section provides guidance and best practices for using media files with prompts for the Gemini API. Being able to use various types of data in your prompts gives you more flexibility in terms of what tasks you can tackle with the Gemini API. For example, you can send the model a photo of a delicious meal and ask it to write a short blog about the meal. Prompt Response Write a short, engaging blog post based on this picture. It should include a description of the meal in the photo and talk about my journey meal prepping. Meal prepping is a great way to save time and money, and it can also help you to eat healthier. This meal is a great example of a healthy and delicious meal that can be easily prepped ahead of time. This meal features brown rice, roasted vegetables, and chicken teriyaki. The brown rice is a healthy and complex carbohydrate that will give you sustained energy. The roasted vegetables are a great way to get your daily dose of vitamins and minerals, and the chicken teriyaki is a delicious and protein-rich option. This meal is also very easy to prepare. Simply cook the brown rice, roast the vegetables, and cook the chicken teriyaki. Once everything is cooked, divide it into meal prep containers and store them in the refrigerator. You can then grab a container and go on busy mornings! If you're looking for a healthy and delicious meal that can be easily prepped ahead of time, this meal is a great option. It's packed with nutrients and flavor, and it's sure to keep you feeling full and satisfied. Here's to healthy and delicious meal prepping! If you are having trouble getting the output you want from prompts that use media files, there are some strategies that can help you get the results you want. The following sections provide design approaches and troubleshooting tips for improving prompts that use multimodal input. You can improve your multimodal prompts by following these best practices: Prompt design fundamentals Be specific in your instructions : Craft clear and concise instructions that leave minimal room for misinterpretation. Add a few examples to your prompt: Use realistic few-shot examples to illustrate what you want to achieve. Break it down step-by-step : Divide complex tasks into manageable sub-goals, guiding the model through the process. Specify the output format : In your prompt, ask for the output to be in the format you want, like markdown, JSON, HTML and more. Put your image first for single-image prompts : While Gemini can handle image and text inputs in any order, for prompts containing a single image, it might perform better if that image (or video) is placed before the text prompt. However, for prompts that require images to be highly interleaved with texts to make sense, use whatever order is most natural. Troubleshooting your multimodal prompt If the model is not drawing information from the relevant part of the image: Drop hints with which aspects of the image you want the prompt to draw information from. If the model output is too generic (not tailored enough to the image/video input): At the start of the prompt, try asking the model to describe the image(s) or video before providing the task instruction, or try asking the model to refer to what's in the image. To troubleshoot which part failed: Ask the model to describe the image, or ask the model to explain its reasoning, to gauge the model's initial understanding. If your prompt results in hallucinated content: Try dialing down the temperature setting or asking the model for shorter descriptions so that it's less likely to extrapolate additional details. Tuning the sampling parameters: Experiment with different temperature settings and top-k selections to adjust the model's creativity. Be specific in your instructions Prompts have the most success when they are clear and detailed. If you have a specific output in mind, it's better to include that requirement in the prompt to ensure you get the output you want. For this image of an airport board, asking the model to just "describe this image" could generate a general description. If you need the model to parse the time and city from the image, you can include that request directly in your prompt. Prompt Model response Describe this image. The image shows an airport arrivals and departures board. Updated prompt Improved response Parse the time and city from the airport board shown in this image into a list. 10:50 Moscow 11:05 Edinburgh 11:05 London 11:10 Bucharest 11:30 Kiev 11:35 Dublin 11:45 East Midlands 12:15 Sofia 12:30 London 12:30 Newcastle 12:40 St Petersburg 12:40 London 12:45 Manchester Add a few examples The Gemini model can accept multiple inputs which it can use as examples to understand the output you want. Adding these examples can help the model identify the patterns and apply the relationship between the given images and responses to the new example. This is also called "few-shot" learning. In the example below, the initial output is written in sentence form, and also contains the country (Brazil). Suppose you want the output in a different format or style, and you want only the city, not the country. Adding few-shot examples to your prompt can steer the model to respond in the way you want. Prompt Model response Determine the city along with the landmark. The landmark is the Christ the Redeemer statue in Rio de Janeiro, Brazil. Updated prompt Improved response Determine the city along with the landmark. city: Rome, landmark: the Colosseum. city: Beijing, landmark: Forbidden City city: Rio de Janeiro, landmark: Christ the Redeemer statue Break it down step-by-step For complex tasks like the ones that require both visual understanding and reasoning, it can be helpful to split the task into smaller, more straightforward steps. Alternatively, it could also be effective if you directly ask the model to “think step by step” in your prompt. Prompt Model response When will I run out of toilet paper? Soon, you only have 3 rolls left. Updated prompt Improved response 1. First, count how many toilet paper rolls are in this picture. 2. Then, determine how much toilet paper a typical person uses per day. 3. Calculate how long these rolls of toilet paper will last. 1. There are 3 rolls of toilet paper in this picture. 2. A typical person uses about 20 sheets of toilet paper per day. 3. If each roll contains 200 sheets, then each roll will last for about 10 days. Therefore, the 3 rolls will last for about a month. Math problems or other types of word problems are great candidates for asking the model to think step-by-step. Prompt Response What is the 4th term in the sequence? -135 The response from the model is incorrect. Some ways to improve this is to ask the model to parse the problem before answering the prompt. Prompt Response Parse the formula in the image first. Then based on the formula, answer "what's the 4th term in the sequence?". Click to expand the result $$ b(n) = b(n-1) * (-3) $$ $$ b(1) = 15 $$ Substitute n = 4 into the formula: $$ b(4) = b(4-1) * (-3) $$ $$ b(4) = b(3) * (-3) $$ We don't know the value of b(3), so substitute n = 3 into the formula: $$ b(3) = b(3-1) * (-3) $$ $$ b(3) = b(2) * (-3) $$ We don't know the value of b(2), so substitute n = 2 into the formula: $$ b(2) = b(2-1) * (-3) $$ $$ b(2) = b(1) * (-3) $$ We know the value of b(1) is 15, so substitute it into the formula: $$ b(2) = 15 * (-3) $$ $$ b(2) = -45 $$ Now that we know the value of b(2), substitute it into the formula for b(3): $$ b(3) = b(2) * (-3) $$ $$ b(3) = -45 * (-3) $$ $$ b(3) = 135 $$ Now that we know the value of b(3), substitute it into the formula for b(4): $$ b(4) = b(3) * (-3) $$ $$ b(4) = 135 * (-3) $$ $$ b(4) = -405 $$ Therefore, the 4th term in the sequence is -405. This model response is correct - now to make the response even more streamlined, you can ask the model to to think step-by-step. Updated prompt Improved response What is the 4th term in the sequence? Think step by step. The first term is 15. The second term is b(1) * (-3) = 15 * (-3) = -45. The third term is b(2) * (-3) = -45 * (-3) = 135. The fourth term is b(3) * (-3) = 135 * (-3) = -405. Specify the output format A common problem is the need for model output to be in a certain format (e.g. markdown, JSON, HTML), especially if the model output needs to be ingested by a downstream task. You can try to produce output in that format by instructing the model to do so within the prompt. Below are two examples: Prompt Model response Parse the table in this image into markdown format | Subject | Status | Time | Start date | Due on | | ------- | ------ | ---- | ---------- | ------ | | Math | Done | 1.5 hrs | 11/29/2023 | 12/1/2023 | | Geography | Done | 30 min | 12/2/2023 | 12/4/2023 | | Physics | In progress | 2 hrs | 12/3/2023 | 12/5/2023 | | Biology | Not started | 6 hrs | 11/2/2016 | 12/6/2023 | Prompt Model response Provide a list of all the following attributes: ingredients, type of cuisine, vegetarian or not, in JSON format ```json { "ingredients": [ "rice", "shrimp", "clams", "mussels", "peas", "tomatoes", "onions", "garlic", "olive oil", "paprika", "salt", "pepper" ], "type of cuisine": "Spanish", "vegetarian": false } ``` Put your image first for single-image prompts While Gemini can interpret image and text in any order within a prompt, placing a single image before the text prompt might lead to better results. In the following samples, you'll notice the image comes first before the text in each prompt. Troubleshooting your multimodal prompt You might need to troubleshoot your prompt if you are not getting a helpful response. Here are a few strategies you could try. If the model is not drawing information from the relevant part of the image To get a more specific response, you can point out which aspects of the image you want the prompt to draw information from to inform its response. Prompt Model response How many days will these diapers last a baby? The diapers will last for 198 days before they run out. Updated prompt Improved response How long will these diapers last before I run out? Use the weight shown on the box to determine the child's age, and use the total number of diapers in the box. Divide the total number by how many diapers the child goes through per day. The box contains 198 diapers. The box states the diapers are for a child weighing 22-37 lbs. The average child in this weight range is likely between 1 and 2 years old. A 2-year-old typically goes through 6 diapers per day, therefore, the diapers will last around 33 days (198/6). If the model output is too generic and not tailored enough to the image input To help the model tailor its response to the image(s), try asking it to describe the images before performing its reasoning task. Prompt Model response What is in common between these images? All four images are examples of landscape photography. Updated prompt Improved response First, describe what's in each image in detail. What's in common between these images? The first image shows a person packing a suitcase. The second image shows an empty hallway in a shopping mall. The third image shows a roll of toilet paper. The fourth image shows a dark and stormy sea. The commonality between these images is that they are all examples of liminal spaces. Liminal spaces are places or situations that are in between two other places or situations. They are often characterized by a sense of uncertainty or ambiguity. The first image shows the person in between being at home and being on a trip. The second image shows the hallway in between two stores. The third image is of toilet paper, which can represent a bathroom. A bathroom is a transitional space between private and public realms. It's a place where we prepare ourselves to enter or leave the outside world. The fourth image shows the sea in between being calm and being stormy. Liminal spaces can be both exciting and unsettling. They can be a place of new beginning or a place of endings. They can be a place of reflection or a place of action. You can also try explicitly asking the model to refer to what's in the image. Updated prompt Improved response What is in common between these images? Refer to what's in the images in your response. All of the images are related to travel. The first image shows a person packing a suitcase, the second image shows an empty hallway in a shopping mall, the third image shows a stack of toilet paper, and the fourth image shows a dark and stormy sea. These images are all related to travel because they are all things that people need or use when they travel. The first image shows a person packing a suitcase, which is something that people do when they are going on a trip. The second image shows an empty hallway in a shopping mall, which is a place where people often go to buy things for their trips. The third image shows a stack of toilet paper, which is something that people often use when they are traveling. The fourth image shows a dark and stormy sea, which is something that people often encounter when they are traveling. The same tips work for creative output from images. In the following example with the rabbit, explicitly asking the model to include aspects from the image can improve the output. Prompt Model response Can you write me a poem about this image? Little bunny, so soft and brown, Hopping through the field, up and down. Your ears so long, your tail so small, You're the cutest creature of them all. You nibble on the grass, so green and sweet, And hop away when you see me. But I'll always be your friend, Little bunny, until the very end. Updated prompt Improved response Can you write me a descriptive and dramatic poem about this image and include the location? Little bunny on the green, Ears so long, so keen. What do you see out there, In the distance, where you stare? Is it the ocean blue, Sparkling in the sun's hue? Or the waves that crash and roar, A rhythm that you adore? Little bunny, tell me true, What captures your attention so? Is it the salty breeze, Or the seagulls' cries that please? Perhaps it's the vastness of it all, A world so big and small. A place where dreams can roam, And adventures can be sown. Little bunny, I wonder what you dream, As you sit on the grass, so serene. Do you yearn to explore the deep, Or stay on land, where you can leap? Whatever it may be, little bunny, Keep that spark of wonder burning brightly. For in your dreams and aspirations, Lies a world waiting for your creation. Troubleshooting which part of the prompt failed It can be hard to know whether a prompt failed because the model didn't understand the image to begin with, or if it did understand the image but did not perform the correct reasoning steps afterward. To disambiguate between those reasons, ask the model to describe what's in the image. In the following example, if the model responds with a snack that seems surprising when paired with tea (e.g. popcorn), you can first troubleshoot to determine whether the model correctly recognized that the image contains tea. Prompt Prompt for troubleshooting What's a snack I can make in 1 minute that would go well with this? Describe what's in this image. Another strategy is to ask the model to explain its reasoning. That can help you narrow down which part of the reasoning broke down, if any. Prompt Prompt for troubleshooting What's a snack I can make in 1 minute that would go well with this? What's a snack I can make in 1 minute that would go well with this? Please explain why. What's next Try writing your own multimodal prompts using Google AI Studio . For information on using the Gemini Files API for uploading media files and including them in your prompts, see the Vision , Audio , and Document processing guides. For more guidance on prompt design, like tuning sampling parameters, see the Prompt strategies page. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
 
2
  Title: Files API  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Files API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Files API The Gemini family of artificial intelligence (AI) models is built to handle various types of input data, including text, images, and audio. Since these models can handle more than one type or mode of data, the Gemini models are called multimodal models or explained as having multimodal capabilities . This guide shows you how to work with media files using the Files API. The basic operations are the same for audio files, images, videos, documents, and other supported file types. For file prompting guidance, check out the File prompt guide section. Upload a file You can use the Files API to upload a media file. Always use the Files API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20 MB. The following code uploads a file and then uses the file in a call to generateContent . Python from google import genai client = genai . Client () myfile = client . files . upload ( file = "path/to/sample.mp3" ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ "Describe this audio clip" , myfile ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); async function main () { const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Describe this audio clip" , ]), }); console . log ( response . text ); } await main (); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } defer client . DeleteFile ( ctx , file . Name ) model := client . GenerativeModel ( "gemini-2.5-flash" ) resp , err := model . GenerateContent ( ctx , genai . FileData { URI : file . URI }, genai . Text ( "Describe this audio clip" )) if err != nil { log . Fatal ( err ) } printResponse ( resp ) REST AUDIO_PATH = "path/to/sample.mp3" MIME_TYPE = $( file -b --mime-type " ${ AUDIO_PATH } " ) NUM_BYTES = $( wc -c < " ${ AUDIO_PATH } " ) DISPLAY_NAME = AUDIO tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl " ${ BASE_URL } /upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D " ${ tmp_header_file } " \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ AUDIO_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "Describe this audio clip"}, {"file_data":{"mime_type": "${MIME_TYPE}", "file_uri": ' $file_uri '}}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Get metadata for a file You can verify that the API successfully stored the uploaded file and get its metadata by calling files.get . Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) file_name = myfile . name myfile = client . files . get ( name = file_name ) print ( myfile ) JavaScript const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const fileName = myfile . name ; const fetchedFile = await ai . files . get ({ name : fileName }); console . log ( fetchedFile ); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } gotFile , err := client . GetFile ( ctx , file . Name ) if err != nil { log . Fatal ( err ) } fmt . Println ( "Got file:" , gotFile . Name ) REST # file_info.json was created in the upload example name = $( jq ".file.name" file_info.json ) # Get the file of interest to check state curl https://generativelanguage.googleapis.com/v1beta/files/ $name \ -H "x-goog-api-key: $GEMINI_API_KEY " > file_info.json # Print some information about the file you got name = $( jq ".file.name" file_info.json ) echo name = $name file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri List uploaded files You can upload multiple files using the Files API. The following code gets a list of all the files uploaded: Python print ( 'My files:' ) for f in client . files . list (): print ( ' ' , f . name ) JavaScript const listResponse = await ai . files . list ({ config : { pageSize : 10 } }); for await ( const file of listResponse ) { console . log ( file . name ); } Go iter := client . ListFiles ( ctx ) for { ifile , err := iter . Next () if err == iterator . Done { break } if err != nil { log . Fatal ( err ) } fmt . Println ( ifile . Name ) } REST echo "My files: " curl "https://generativelanguage.googleapis.com/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " Delete uploaded files Files are automatically deleted after 48 hours. You can also manually delete an uploaded file: Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) client . files . delete ( name = myfile . name ) JavaScript const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const fileName = myfile . name ; await ai . files . delete ({ name : fileName }); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } client . DeleteFile ( ctx , file . Name ) REST curl --request "DELETE" https://generativelanguage.googleapis.com/v1beta/files/ $name \ -H "x-goog-api-key: $GEMINI_API_KEY " Usage info You can use the Files API to upload and interact with media files. The Files API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB. Files are stored for 48 hours. During that time, you can use the API to get metadata about the files, but you can't download the files. The Files API is available at no cost in all regions where the Gemini API is available. File prompting strategies This section provides guidance and best practices for using media files with prompts for the Gemini API. Being able to use various types of data in your prompts gives you more flexibility in terms of what tasks you can tackle with the Gemini API. For example, you can send the model a photo of a delicious meal and ask it to write a short blog about the meal. Prompt Response Write a short, engaging blog post based on this picture. It should include a description of the meal in the photo and talk about my journey meal prepping. Meal prepping is a great way to save time and money, and it can also help you to eat healthier. This meal is a great example of a healthy and delicious meal that can be easily prepped ahead of time. This meal features brown rice, roasted vegetables, and chicken teriyaki. The brown rice is a healthy and complex carbohydrate that will give you sustained energy. The roasted vegetables are a great way to get your daily dose of vitamins and minerals, and the chicken teriyaki is a delicious and protein-rich option. This meal is also very easy to prepare. Simply cook the brown rice, roast the vegetables, and cook the chicken teriyaki. Once everything is cooked, divide it into meal prep containers and store them in the refrigerator. You can then grab a container and go on busy mornings! If you're looking for a healthy and delicious meal that can be easily prepped ahead of time, this meal is a great option. It's packed with nutrients and flavor, and it's sure to keep you feeling full and satisfied. Here's to healthy and delicious meal prepping! If you are having trouble getting the output you want from prompts that use media files, there are some strategies that can help you get the results you want. The following sections provide design approaches and troubleshooting tips for improving prompts that use multimodal input. You can improve your multimodal prompts by following these best practices: Prompt design fundamentals Be specific in your instructions : Craft clear and concise instructions that leave minimal room for misinterpretation. Add a few examples to your prompt: Use realistic few-shot examples to illustrate what you want to achieve. Break it down step-by-step : Divide complex tasks into manageable sub-goals, guiding the model through the process. Specify the output format : In your prompt, ask for the output to be in the format you want, like markdown, JSON, HTML and more. Put your image first for single-image prompts : While Gemini can handle image and text inputs in any order, for prompts containing a single image, it might perform better if that image (or video) is placed before the text prompt. However, for prompts that require images to be highly interleaved with texts to make sense, use whatever order is most natural. Troubleshooting your multimodal prompt If the model is not drawing information from the relevant part of the image: Drop hints with which aspects of the image you want the prompt to draw information from. If the model output is too generic (not tailored enough to the image/video input): At the start of the prompt, try asking the model to describe the image(s) or video before providing the task instruction, or try asking the model to refer to what's in the image. To troubleshoot which part failed: Ask the model to describe the image, or ask the model to explain its reasoning, to gauge the model's initial understanding. If your prompt results in hallucinated content: Try dialing down the temperature setting or asking the model for shorter descriptions so that it's less likely to extrapolate additional details. Tuning the sampling parameters: Experiment with different temperature settings and top-k selections to adjust the model's creativity. Be specific in your instructions Prompts have the most success when they are clear and detailed. If you have a specific output in mind, it's better to include that requirement in the prompt to ensure you get the output you want. For this image of an airport board, asking the model to just "describe this image" could generate a general description. If you need the model to parse the time and city from the image, you can include that request directly in your prompt. Prompt Model response Describe this image. The image shows an airport arrivals and departures board. Updated prompt Improved response Parse the time and city from the airport board shown in this image into a list. 10:50 Moscow 11:05 Edinburgh 11:05 London 11:10 Bucharest 11:30 Kiev 11:35 Dublin 11:45 East Midlands 12:15 Sofia 12:30 London 12:30 Newcastle 12:40 St Petersburg 12:40 London 12:45 Manchester Add a few examples The Gemini model can accept multiple inputs which it can use as examples to understand the output you want. Adding these examples can help the model identify the patterns and apply the relationship between the given images and responses to the new example. This is also called "few-shot" learning. In the example below, the initial output is written in sentence form, and also contains the country (Brazil). Suppose you want the output in a different format or style, and you want only the city, not the country. Adding few-shot examples to your prompt can steer the model to respond in the way you want. Prompt Model response Determine the city along with the landmark. The landmark is the Christ the Redeemer statue in Rio de Janeiro, Brazil. Updated prompt Improved response Determine the city along with the landmark. city: Rome, landmark: the Colosseum. city: Beijing, landmark: Forbidden City city: Rio de Janeiro, landmark: Christ the Redeemer statue Break it down step-by-step For complex tasks like the ones that require both visual understanding and reasoning, it can be helpful to split the task into smaller, more straightforward steps. Alternatively, it could also be effective if you directly ask the model to “think step by step” in your prompt. Prompt Model response When will I run out of toilet paper? Soon, you only have 3 rolls left. Updated prompt Improved response 1. First, count how many toilet paper rolls are in this picture. 2. Then, determine how much toilet paper a typical person uses per day. 3. Calculate how long these rolls of toilet paper will last. 1. There are 3 rolls of toilet paper in this picture. 2. A typical person uses about 20 sheets of toilet paper per day. 3. If each roll contains 200 sheets, then each roll will last for about 10 days. Therefore, the 3 rolls will last for about a month. Math problems or other types of word problems are great candidates for asking the model to think step-by-step. Prompt Response What is the 4th term in the sequence? -135 The response from the model is incorrect. Some ways to improve this is to ask the model to parse the problem before answering the prompt. Prompt Response Parse the formula in the image first. Then based on the formula, answer "what's the 4th term in the sequence?". Click to expand the result $$ b(n) = b(n-1) * (-3) $$ $$ b(1) = 15 $$ Substitute n = 4 into the formula: $$ b(4) = b(4-1) * (-3) $$ $$ b(4) = b(3) * (-3) $$ We don't know the value of b(3), so substitute n = 3 into the formula: $$ b(3) = b(3-1) * (-3) $$ $$ b(3) = b(2) * (-3) $$ We don't know the value of b(2), so substitute n = 2 into the formula: $$ b(2) = b(2-1) * (-3) $$ $$ b(2) = b(1) * (-3) $$ We know the value of b(1) is 15, so substitute it into the formula: $$ b(2) = 15 * (-3) $$ $$ b(2) = -45 $$ Now that we know the value of b(2), substitute it into the formula for b(3): $$ b(3) = b(2) * (-3) $$ $$ b(3) = -45 * (-3) $$ $$ b(3) = 135 $$ Now that we know the value of b(3), substitute it into the formula for b(4): $$ b(4) = b(3) * (-3) $$ $$ b(4) = 135 * (-3) $$ $$ b(4) = -405 $$ Therefore, the 4th term in the sequence is -405. This model response is correct - now to make the response even more streamlined, you can ask the model to to think step-by-step. Updated prompt Improved response What is the 4th term in the sequence? Think step by step. The first term is 15. The second term is b(1) * (-3) = 15 * (-3) = -45. The third term is b(2) * (-3) = -45 * (-3) = 135. The fourth term is b(3) * (-3) = 135 * (-3) = -405. Specify the output format A common problem is the need for model output to be in a certain format (e.g. markdown, JSON, HTML), especially if the model output needs to be ingested by a downstream task. You can try to produce output in that format by instructing the model to do so within the prompt. Below are two examples: Prompt Model response Parse the table in this image into markdown format | Subject | Status | Time | Start date | Due on | | ------- | ------ | ---- | ---------- | ------ | | Math | Done | 1.5 hrs | 11/29/2023 | 12/1/2023 | | Geography | Done | 30 min | 12/2/2023 | 12/4/2023 | | Physics | In progress | 2 hrs | 12/3/2023 | 12/5/2023 | | Biology | Not started | 6 hrs | 11/2/2016 | 12/6/2023 | Prompt Model response Provide a list of all the following attributes: ingredients, type of cuisine, vegetarian or not, in JSON format ```json { "ingredients": [ "rice", "shrimp", "clams", "mussels", "peas", "tomatoes", "onions", "garlic", "olive oil", "paprika", "salt", "pepper" ], "type of cuisine": "Spanish", "vegetarian": false } ``` Put your image first for single-image prompts While Gemini can interpret image and text in any order within a prompt, placing a single image before the text prompt might lead to better results. In the following samples, you'll notice the image comes first before the text in each prompt. Troubleshooting your multimodal prompt You might need to troubleshoot your prompt if you are not getting a helpful response. Here are a few strategies you could try. If the model is not drawing information from the relevant part of the image To get a more specific response, you can point out which aspects of the image you want the prompt to draw information from to inform its response. Prompt Model response How many days will these diapers last a baby? The diapers will last for 198 days before they run out. Updated prompt Improved response How long will these diapers last before I run out? Use the weight shown on the box to determine the child's age, and use the total number of diapers in the box. Divide the total number by how many diapers the child goes through per day. The box contains 198 diapers. The box states the diapers are for a child weighing 22-37 lbs. The average child in this weight range is likely between 1 and 2 years old. A 2-year-old typically goes through 6 diapers per day, therefore, the diapers will last around 33 days (198/6). If the model output is too generic and not tailored enough to the image input To help the model tailor its response to the image(s), try asking it to describe the images before performing its reasoning task. Prompt Model response What is in common between these images? All four images are examples of landscape photography. Updated prompt Improved response First, describe what's in each image in detail. What's in common between these images? The first image shows a person packing a suitcase. The second image shows an empty hallway in a shopping mall. The third image shows a roll of toilet paper. The fourth image shows a dark and stormy sea. The commonality between these images is that they are all examples of liminal spaces. Liminal spaces are places or situations that are in between two other places or situations. They are often characterized by a sense of uncertainty or ambiguity. The first image shows the person in between being at home and being on a trip. The second image shows the hallway in between two stores. The third image is of toilet paper, which can represent a bathroom. A bathroom is a transitional space between private and public realms. It's a place where we prepare ourselves to enter or leave the outside world. The fourth image shows the sea in between being calm and being stormy. Liminal spaces can be both exciting and unsettling. They can be a place of new beginning or a place of endings. They can be a place of reflection or a place of action. You can also try explicitly asking the model to refer to what's in the image. Updated prompt Improved response What is in common between these images? Refer to what's in the images in your response. All of the images are related to travel. The first image shows a person packing a suitcase, the second image shows an empty hallway in a shopping mall, the third image shows a stack of toilet paper, and the fourth image shows a dark and stormy sea. These images are all related to travel because they are all things that people need or use when they travel. The first image shows a person packing a suitcase, which is something that people do when they are going on a trip. The second image shows an empty hallway in a shopping mall, which is a place where people often go to buy things for their trips. The third image shows a stack of toilet paper, which is something that people often use when they are traveling. The fourth image shows a dark and stormy sea, which is something that people often encounter when they are traveling. The same tips work for creative output from images. In the following example with the rabbit, explicitly asking the model to include aspects from the image can improve the output. Prompt Model response Can you write me a poem about this image? Little bunny, so soft and brown, Hopping through the field, up and down. Your ears so long, your tail so small, You're the cutest creature of them all. You nibble on the grass, so green and sweet, And hop away when you see me. But I'll always be your friend, Little bunny, until the very end. Updated prompt Improved response Can you write me a descriptive and dramatic poem about this image and include the location? Little bunny on the green, Ears so long, so keen. What do you see out there, In the distance, where you stare? Is it the ocean blue, Sparkling in the sun's hue? Or the waves that crash and roar, A rhythm that you adore? Little bunny, tell me true, What captures your attention so? Is it the salty breeze, Or the seagulls' cries that please? Perhaps it's the vastness of it all, A world so big and small. A place where dreams can roam, And adventures can be sown. Little bunny, I wonder what you dream, As you sit on the grass, so serene. Do you yearn to explore the deep, Or stay on land, where you can leap? Whatever it may be, little bunny, Keep that spark of wonder burning brightly. For in your dreams and aspirations, Lies a world waiting for your creation. Troubleshooting which part of the prompt failed It can be hard to know whether a prompt failed because the model didn't understand the image to begin with, or if it did understand the image but did not perform the correct reasoning steps afterward. To disambiguate between those reasons, ask the model to describe what's in the image. In the following example, if the model responds with a snack that seems surprising when paired with tea (e.g. popcorn), you can first troubleshoot to determine whether the model correctly recognized that the image contains tea. Prompt Prompt for troubleshooting What's a snack I can make in 1 minute that would go well with this? Describe what's in this image. Another strategy is to ask the model to explain its reasoning. That can help you narrow down which part of the reasoning broke down, if any. Prompt Prompt for troubleshooting What's a snack I can make in 1 minute that would go well with this? What's a snack I can make in 1 minute that would go well with this? Please explain why. What's next Try writing your own multimodal prompts using Google AI Studio . For information on using the Gemini Files API for uploading media files and including them in your prompts, see the Vision , Audio , and Document processing guides. For more guidance on prompt design, like tuning sampling parameters, see the Prompt strategies page. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_files_1a591f45.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/files#specific-instructions
2
  Title: Files API  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Files API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Files API The Gemini family of artificial intelligence (AI) models is built to handle various types of input data, including text, images, and audio. Since these models can handle more than one type or mode of data, the Gemini models are called multimodal models or explained as having multimodal capabilities . This guide shows you how to work with media files using the Files API. The basic operations are the same for audio files, images, videos, documents, and other supported file types. For file prompting guidance, check out the File prompt guide section. Upload a file You can use the Files API to upload a media file. Always use the Files API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20 MB. The following code uploads a file and then uses the file in a call to generateContent . Python from google import genai client = genai . Client () myfile = client . files . upload ( file = "path/to/sample.mp3" ) response = client . models . generate_content ( model = "gemini-2.0-flash" , contents = [ "Describe this audio clip" , myfile ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); async function main () { const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const response = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Describe this audio clip" , ]), }); console . log ( response . text ); } await main (); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } defer client . DeleteFile ( ctx , file . Name ) model := client . GenerativeModel ( "gemini-2.0-flash" ) resp , err := model . GenerateContent ( ctx , genai . FileData { URI : file . URI }, genai . Text ( "Describe this audio clip" )) if err != nil { log . Fatal ( err ) } printResponse ( resp ) REST AUDIO_PATH = "path/to/sample.mp3" MIME_TYPE = $( file -b --mime-type " ${ AUDIO_PATH } " ) NUM_BYTES = $( wc -c < " ${ AUDIO_PATH } " ) DISPLAY_NAME = AUDIO tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl " ${ BASE_URL } /upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D " ${ tmp_header_file } " \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ AUDIO_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "Describe this audio clip"}, {"file_data":{"mime_type": "${MIME_TYPE}", "file_uri": ' $file_uri '}}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Get metadata for a file You can verify that the API successfully stored the uploaded file and get its metadata by calling files.get . Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) file_name = myfile . name myfile = client . files . get ( name = file_name ) print ( myfile ) JavaScript const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const fileName = myfile . name ; const fetchedFile = await ai . files . get ({ name : fileName }); console . log ( fetchedFile ); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } gotFile , err := client . GetFile ( ctx , file . Name ) if err != nil { log . Fatal ( err ) } fmt . Println ( "Got file:" , gotFile . Name ) REST # file_info.json was created in the upload example name = $( jq ".file.name" file_info.json ) # Get the file of interest to check state curl https://generativelanguage.googleapis.com/v1beta/files/ $name \ -H "x-goog-api-key: $GEMINI_API_KEY " > file_info.json # Print some information about the file you got name = $( jq ".file.name" file_info.json ) echo name = $name file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri List uploaded files You can upload multiple files using the Files API. The following code gets a list of all the files uploaded: Python print ( 'My files:' ) for f in client . files . list (): print ( ' ' , f . name ) JavaScript const listResponse = await ai . files . list ({ config : { pageSize : 10 } }); for await ( const file of listResponse ) { console . log ( file . name ); } Go iter := client . ListFiles ( ctx ) for { ifile , err := iter . Next () if err == iterator . Done { break } if err != nil { log . Fatal ( err ) } fmt . Println ( ifile . Name ) } REST echo "My files: " curl "https://generativelanguage.googleapis.com/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " Delete uploaded files Files are automatically deleted after 48 hours. You can also manually delete an uploaded file: Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) client . files . delete ( name = myfile . name ) JavaScript const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const fileName = myfile . name ; await ai . files . delete ({ name : fileName }); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } client . DeleteFile ( ctx , file . Name ) REST curl --request "DELETE" https://generativelanguage.googleapis.com/v1beta/files/ $name \ -H "x-goog-api-key: $GEMINI_API_KEY " Usage info You can use the Files API to upload and interact with media files. The Files API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB. Files are stored for 48 hours. During that time, you can use the API to get metadata about the files, but you can't download the files. The Files API is available at no cost in all regions where the Gemini API is available. File prompting strategies This section provides guidance and best practices for using media files with prompts for the Gemini API. Being able to use various types of data in your prompts gives you more flexibility in terms of what tasks you can tackle with the Gemini API. For example, you can send the model a photo of a delicious meal and ask it to write a short blog about the meal. Prompt Response Write a short, engaging blog post based on this picture. It should include a description of the meal in the photo and talk about my journey meal prepping. Meal prepping is a great way to save time and money, and it can also help you to eat healthier. This meal is a great example of a healthy and delicious meal that can be easily prepped ahead of time. This meal features brown rice, roasted vegetables, and chicken teriyaki. The brown rice is a healthy and complex carbohydrate that will give you sustained energy. The roasted vegetables are a great way to get your daily dose of vitamins and minerals, and the chicken teriyaki is a delicious and protein-rich option. This meal is also very easy to prepare. Simply cook the brown rice, roast the vegetables, and cook the chicken teriyaki. Once everything is cooked, divide it into meal prep containers and store them in the refrigerator. You can then grab a container and go on busy mornings! If you're looking for a healthy and delicious meal that can be easily prepped ahead of time, this meal is a great option. It's packed with nutrients and flavor, and it's sure to keep you feeling full and satisfied. Here's to healthy and delicious meal prepping! If you are having trouble getting the output you want from prompts that use media files, there are some strategies that can help you get the results you want. The following sections provide design approaches and troubleshooting tips for improving prompts that use multimodal input. You can improve your multimodal prompts by following these best practices: Prompt design fundamentals Be specific in your instructions : Craft clear and concise instructions that leave minimal room for misinterpretation. Add a few examples to your prompt: Use realistic few-shot examples to illustrate what you want to achieve. Break it down step-by-step : Divide complex tasks into manageable sub-goals, guiding the model through the process. Specify the output format : In your prompt, ask for the output to be in the format you want, like markdown, JSON, HTML and more. Put your image first for single-image prompts : While Gemini can handle image and text inputs in any order, for prompts containing a single image, it might perform better if that image (or video) is placed before the text prompt. However, for prompts that require images to be highly interleaved with texts to make sense, use whatever order is most natural. Troubleshooting your multimodal prompt If the model is not drawing information from the relevant part of the image: Drop hints with which aspects of the image you want the prompt to draw information from. If the model output is too generic (not tailored enough to the image/video input): At the start of the prompt, try asking the model to describe the image(s) or video before providing the task instruction, or try asking the model to refer to what's in the image. To troubleshoot which part failed: Ask the model to describe the image, or ask the model to explain its reasoning, to gauge the model's initial understanding. If your prompt results in hallucinated content: Try dialing down the temperature setting or asking the model for shorter descriptions so that it's less likely to extrapolate additional details. Tuning the sampling parameters: Experiment with different temperature settings and top-k selections to adjust the model's creativity. Be specific in your instructions Prompts have the most success when they are clear and detailed. If you have a specific output in mind, it's better to include that requirement in the prompt to ensure you get the output you want. For this image of an airport board, asking the model to just "describe this image" could generate a general description. If you need the model to parse the time and city from the image, you can include that request directly in your prompt. Prompt Model response Describe this image. The image shows an airport arrivals and departures board. Updated prompt Improved response Parse the time and city from the airport board shown in this image into a list. 10:50 Moscow 11:05 Edinburgh 11:05 London 11:10 Bucharest 11:30 Kiev 11:35 Dublin 11:45 East Midlands 12:15 Sofia 12:30 London 12:30 Newcastle 12:40 St Petersburg 12:40 London 12:45 Manchester Add a few examples The Gemini model can accept multiple inputs which it can use as examples to understand the output you want. Adding these examples can help the model identify the patterns and apply the relationship between the given images and responses to the new example. This is also called "few-shot" learning. In the example below, the initial output is written in sentence form, and also contains the country (Brazil). Suppose you want the output in a different format or style, and you want only the city, not the country. Adding few-shot examples to your prompt can steer the model to respond in the way you want. Prompt Model response Determine the city along with the landmark. The landmark is the Christ the Redeemer statue in Rio de Janeiro, Brazil. Updated prompt Improved response Determine the city along with the landmark. city: Rome, landmark: the Colosseum. city: Beijing, landmark: Forbidden City city: Rio de Janeiro, landmark: Christ the Redeemer statue Break it down step-by-step For complex tasks like the ones that require both visual understanding and reasoning, it can be helpful to split the task into smaller, more straightforward steps. Alternatively, it could also be effective if you directly ask the model to “think step by step” in your prompt. Prompt Model response When will I run out of toilet paper? Soon, you only have 3 rolls left. Updated prompt Improved response 1. First, count how many toilet paper rolls are in this picture. 2. Then, determine how much toilet paper a typical person uses per day. 3. Calculate how long these rolls of toilet paper will last. 1. There are 3 rolls of toilet paper in this picture. 2. A typical person uses about 20 sheets of toilet paper per day. 3. If each roll contains 200 sheets, then each roll will last for about 10 days. Therefore, the 3 rolls will last for about a month. Math problems or other types of word problems are great candidates for asking the model to think step-by-step. Prompt Response What is the 4th term in the sequence? -135 The response from the model is incorrect. Some ways to improve this is to ask the model to parse the problem before answering the prompt. Prompt Response Parse the formula in the image first. Then based on the formula, answer "what's the 4th term in the sequence?". Click to expand the result $$ b(n) = b(n-1) * (-3) $$ $$ b(1) = 15 $$ Substitute n = 4 into the formula: $$ b(4) = b(4-1) * (-3) $$ $$ b(4) = b(3) * (-3) $$ We don't know the value of b(3), so substitute n = 3 into the formula: $$ b(3) = b(3-1) * (-3) $$ $$ b(3) = b(2) * (-3) $$ We don't know the value of b(2), so substitute n = 2 into the formula: $$ b(2) = b(2-1) * (-3) $$ $$ b(2) = b(1) * (-3) $$ We know the value of b(1) is 15, so substitute it into the formula: $$ b(2) = 15 * (-3) $$ $$ b(2) = -45 $$ Now that we know the value of b(2), substitute it into the formula for b(3): $$ b(3) = b(2) * (-3) $$ $$ b(3) = -45 * (-3) $$ $$ b(3) = 135 $$ Now that we know the value of b(3), substitute it into the formula for b(4): $$ b(4) = b(3) * (-3) $$ $$ b(4) = 135 * (-3) $$ $$ b(4) = -405 $$ Therefore, the 4th term in the sequence is -405. This model response is correct - now to make the response even more streamlined, you can ask the model to to think step-by-step. Updated prompt Improved response What is the 4th term in the sequence? Think step by step. The first term is 15. The second term is b(1) * (-3) = 15 * (-3) = -45. The third term is b(2) * (-3) = -45 * (-3) = 135. The fourth term is b(3) * (-3) = 135 * (-3) = -405. Specify the output format A common problem is the need for model output to be in a certain format (e.g. markdown, JSON, HTML), especially if the model output needs to be ingested by a downstream task. You can try to produce output in that format by instructing the model to do so within the prompt. Below are two examples: Prompt Model response Parse the table in this image into markdown format | Subject | Status | Time | Start date | Due on | | ------- | ------ | ---- | ---------- | ------ | | Math | Done | 1.5 hrs | 11/29/2023 | 12/1/2023 | | Geography | Done | 30 min | 12/2/2023 | 12/4/2023 | | Physics | In progress | 2 hrs | 12/3/2023 | 12/5/2023 | | Biology | Not started | 6 hrs | 11/2/2016 | 12/6/2023 | Prompt Model response Provide a list of all the following attributes: ingredients, type of cuisine, vegetarian or not, in JSON format ```json { "ingredients": [ "rice", "shrimp", "clams", "mussels", "peas", "tomatoes", "onions", "garlic", "olive oil", "paprika", "salt", "pepper" ], "type of cuisine": "Spanish", "vegetarian": false } ``` Put your image first for single-image prompts While Gemini can interpret image and text in any order within a prompt, placing a single image before the text prompt might lead to better results. In the following samples, you'll notice the image comes first before the text in each prompt. Troubleshooting your multimodal prompt You might need to troubleshoot your prompt if you are not getting a helpful response. Here are a few strategies you could try. If the model is not drawing information from the relevant part of the image To get a more specific response, you can point out which aspects of the image you want the prompt to draw information from to inform its response. Prompt Model response How many days will these diapers last a baby? The diapers will last for 198 days before they run out. Updated prompt Improved response How long will these diapers last before I run out? Use the weight shown on the box to determine the child's age, and use the total number of diapers in the box. Divide the total number by how many diapers the child goes through per day. The box contains 198 diapers. The box states the diapers are for a child weighing 22-37 lbs. The average child in this weight range is likely between 1 and 2 years old. A 2-year-old typically goes through 6 diapers per day, therefore, the diapers will last around 33 days (198/6). If the model output is too generic and not tailored enough to the image input To help the model tailor its response to the image(s), try asking it to describe the images before performing its reasoning task. Prompt Model response What is in common between these images? All four images are examples of landscape photography. Updated prompt Improved response First, describe what's in each image in detail. What's in common between these images? The first image shows a person packing a suitcase. The second image shows an empty hallway in a shopping mall. The third image shows a roll of toilet paper. The fourth image shows a dark and stormy sea. The commonality between these images is that they are all examples of liminal spaces. Liminal spaces are places or situations that are in between two other places or situations. They are often characterized by a sense of uncertainty or ambiguity. The first image shows the person in between being at home and being on a trip. The second image shows the hallway in between two stores. The third image is of toilet paper, which can represent a bathroom. A bathroom is a transitional space between private and public realms. It's a place where we prepare ourselves to enter or leave the outside world. The fourth image shows the sea in between being calm and being stormy. Liminal spaces can be both exciting and unsettling. They can be a place of new beginning or a place of endings. They can be a place of reflection or a place of action. You can also try explicitly asking the model to refer to what's in the image. Updated prompt Improved response What is in common between these images? Refer to what's in the images in your response. All of the images are related to travel. The first image shows a person packing a suitcase, the second image shows an empty hallway in a shopping mall, the third image shows a stack of toilet paper, and the fourth image shows a dark and stormy sea. These images are all related to travel because they are all things that people need or use when they travel. The first image shows a person packing a suitcase, which is something that people do when they are going on a trip. The second image shows an empty hallway in a shopping mall, which is a place where people often go to buy things for their trips. The third image shows a stack of toilet paper, which is something that people often use when they are traveling. The fourth image shows a dark and stormy sea, which is something that people often encounter when they are traveling. The same tips work for creative output from images. In the following example with the rabbit, explicitly asking the model to include aspects from the image can improve the output. Prompt Model response Can you write me a poem about this image? Little bunny, so soft and brown, Hopping through the field, up and down. Your ears so long, your tail so small, You're the cutest creature of them all. You nibble on the grass, so green and sweet, And hop away when you see me. But I'll always be your friend, Little bunny, until the very end. Updated prompt Improved response Can you write me a descriptive and dramatic poem about this image and include the location? Little bunny on the green, Ears so long, so keen. What do you see out there, In the distance, where you stare? Is it the ocean blue, Sparkling in the sun's hue? Or the waves that crash and roar, A rhythm that you adore? Little bunny, tell me true, What captures your attention so? Is it the salty breeze, Or the seagulls' cries that please? Perhaps it's the vastness of it all, A world so big and small. A place where dreams can roam, And adventures can be sown. Little bunny, I wonder what you dream, As you sit on the grass, so serene. Do you yearn to explore the deep, Or stay on land, where you can leap? Whatever it may be, little bunny, Keep that spark of wonder burning brightly. For in your dreams and aspirations, Lies a world waiting for your creation. Troubleshooting which part of the prompt failed It can be hard to know whether a prompt failed because the model didn't understand the image to begin with, or if it did understand the image but did not perform the correct reasoning steps afterward. To disambiguate between those reasons, ask the model to describe what's in the image. In the following example, if the model responds with a snack that seems surprising when paired with tea (e.g. popcorn), you can first troubleshoot to determine whether the model correctly recognized that the image contains tea. Prompt Prompt for troubleshooting What's a snack I can make in 1 minute that would go well with this? Describe what's in this image. Another strategy is to ask the model to explain its reasoning. That can help you narrow down which part of the reasoning broke down, if any. Prompt Prompt for troubleshooting What's a snack I can make in 1 minute that would go well with this? What's a snack I can make in 1 minute that would go well with this? Please explain why. What's next Try writing your own multimodal prompts using Google AI Studio . For information on using the Gemini Files API for uploading media files and including them in your prompts, see the Vision , Audio , and Document processing guides. For more guidance on prompt design, like tuning sampling parameters, see the Prompt strategies page. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
 
2
  Title: Files API  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Files API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Files API The Gemini family of artificial intelligence (AI) models is built to handle various types of input data, including text, images, and audio. Since these models can handle more than one type or mode of data, the Gemini models are called multimodal models or explained as having multimodal capabilities . This guide shows you how to work with media files using the Files API. The basic operations are the same for audio files, images, videos, documents, and other supported file types. For file prompting guidance, check out the File prompt guide section. Upload a file You can use the Files API to upload a media file. Always use the Files API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20 MB. The following code uploads a file and then uses the file in a call to generateContent . Python from google import genai client = genai . Client () myfile = client . files . upload ( file = "path/to/sample.mp3" ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ "Describe this audio clip" , myfile ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); async function main () { const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Describe this audio clip" , ]), }); console . log ( response . text ); } await main (); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } defer client . DeleteFile ( ctx , file . Name ) model := client . GenerativeModel ( "gemini-2.5-flash" ) resp , err := model . GenerateContent ( ctx , genai . FileData { URI : file . URI }, genai . Text ( "Describe this audio clip" )) if err != nil { log . Fatal ( err ) } printResponse ( resp ) REST AUDIO_PATH = "path/to/sample.mp3" MIME_TYPE = $( file -b --mime-type " ${ AUDIO_PATH } " ) NUM_BYTES = $( wc -c < " ${ AUDIO_PATH } " ) DISPLAY_NAME = AUDIO tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl " ${ BASE_URL } /upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D " ${ tmp_header_file } " \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ AUDIO_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "Describe this audio clip"}, {"file_data":{"mime_type": "${MIME_TYPE}", "file_uri": ' $file_uri '}}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Get metadata for a file You can verify that the API successfully stored the uploaded file and get its metadata by calling files.get . Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) file_name = myfile . name myfile = client . files . get ( name = file_name ) print ( myfile ) JavaScript const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const fileName = myfile . name ; const fetchedFile = await ai . files . get ({ name : fileName }); console . log ( fetchedFile ); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } gotFile , err := client . GetFile ( ctx , file . Name ) if err != nil { log . Fatal ( err ) } fmt . Println ( "Got file:" , gotFile . Name ) REST # file_info.json was created in the upload example name = $( jq ".file.name" file_info.json ) # Get the file of interest to check state curl https://generativelanguage.googleapis.com/v1beta/files/ $name \ -H "x-goog-api-key: $GEMINI_API_KEY " > file_info.json # Print some information about the file you got name = $( jq ".file.name" file_info.json ) echo name = $name file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri List uploaded files You can upload multiple files using the Files API. The following code gets a list of all the files uploaded: Python print ( 'My files:' ) for f in client . files . list (): print ( ' ' , f . name ) JavaScript const listResponse = await ai . files . list ({ config : { pageSize : 10 } }); for await ( const file of listResponse ) { console . log ( file . name ); } Go iter := client . ListFiles ( ctx ) for { ifile , err := iter . Next () if err == iterator . Done { break } if err != nil { log . Fatal ( err ) } fmt . Println ( ifile . Name ) } REST echo "My files: " curl "https://generativelanguage.googleapis.com/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " Delete uploaded files Files are automatically deleted after 48 hours. You can also manually delete an uploaded file: Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) client . files . delete ( name = myfile . name ) JavaScript const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const fileName = myfile . name ; await ai . files . delete ({ name : fileName }); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } client . DeleteFile ( ctx , file . Name ) REST curl --request "DELETE" https://generativelanguage.googleapis.com/v1beta/files/ $name \ -H "x-goog-api-key: $GEMINI_API_KEY " Usage info You can use the Files API to upload and interact with media files. The Files API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB. Files are stored for 48 hours. During that time, you can use the API to get metadata about the files, but you can't download the files. The Files API is available at no cost in all regions where the Gemini API is available. File prompting strategies This section provides guidance and best practices for using media files with prompts for the Gemini API. Being able to use various types of data in your prompts gives you more flexibility in terms of what tasks you can tackle with the Gemini API. For example, you can send the model a photo of a delicious meal and ask it to write a short blog about the meal. Prompt Response Write a short, engaging blog post based on this picture. It should include a description of the meal in the photo and talk about my journey meal prepping. Meal prepping is a great way to save time and money, and it can also help you to eat healthier. This meal is a great example of a healthy and delicious meal that can be easily prepped ahead of time. This meal features brown rice, roasted vegetables, and chicken teriyaki. The brown rice is a healthy and complex carbohydrate that will give you sustained energy. The roasted vegetables are a great way to get your daily dose of vitamins and minerals, and the chicken teriyaki is a delicious and protein-rich option. This meal is also very easy to prepare. Simply cook the brown rice, roast the vegetables, and cook the chicken teriyaki. Once everything is cooked, divide it into meal prep containers and store them in the refrigerator. You can then grab a container and go on busy mornings! If you're looking for a healthy and delicious meal that can be easily prepped ahead of time, this meal is a great option. It's packed with nutrients and flavor, and it's sure to keep you feeling full and satisfied. Here's to healthy and delicious meal prepping! If you are having trouble getting the output you want from prompts that use media files, there are some strategies that can help you get the results you want. The following sections provide design approaches and troubleshooting tips for improving prompts that use multimodal input. You can improve your multimodal prompts by following these best practices: Prompt design fundamentals Be specific in your instructions : Craft clear and concise instructions that leave minimal room for misinterpretation. Add a few examples to your prompt: Use realistic few-shot examples to illustrate what you want to achieve. Break it down step-by-step : Divide complex tasks into manageable sub-goals, guiding the model through the process. Specify the output format : In your prompt, ask for the output to be in the format you want, like markdown, JSON, HTML and more. Put your image first for single-image prompts : While Gemini can handle image and text inputs in any order, for prompts containing a single image, it might perform better if that image (or video) is placed before the text prompt. However, for prompts that require images to be highly interleaved with texts to make sense, use whatever order is most natural. Troubleshooting your multimodal prompt If the model is not drawing information from the relevant part of the image: Drop hints with which aspects of the image you want the prompt to draw information from. If the model output is too generic (not tailored enough to the image/video input): At the start of the prompt, try asking the model to describe the image(s) or video before providing the task instruction, or try asking the model to refer to what's in the image. To troubleshoot which part failed: Ask the model to describe the image, or ask the model to explain its reasoning, to gauge the model's initial understanding. If your prompt results in hallucinated content: Try dialing down the temperature setting or asking the model for shorter descriptions so that it's less likely to extrapolate additional details. Tuning the sampling parameters: Experiment with different temperature settings and top-k selections to adjust the model's creativity. Be specific in your instructions Prompts have the most success when they are clear and detailed. If you have a specific output in mind, it's better to include that requirement in the prompt to ensure you get the output you want. For this image of an airport board, asking the model to just "describe this image" could generate a general description. If you need the model to parse the time and city from the image, you can include that request directly in your prompt. Prompt Model response Describe this image. The image shows an airport arrivals and departures board. Updated prompt Improved response Parse the time and city from the airport board shown in this image into a list. 10:50 Moscow 11:05 Edinburgh 11:05 London 11:10 Bucharest 11:30 Kiev 11:35 Dublin 11:45 East Midlands 12:15 Sofia 12:30 London 12:30 Newcastle 12:40 St Petersburg 12:40 London 12:45 Manchester Add a few examples The Gemini model can accept multiple inputs which it can use as examples to understand the output you want. Adding these examples can help the model identify the patterns and apply the relationship between the given images and responses to the new example. This is also called "few-shot" learning. In the example below, the initial output is written in sentence form, and also contains the country (Brazil). Suppose you want the output in a different format or style, and you want only the city, not the country. Adding few-shot examples to your prompt can steer the model to respond in the way you want. Prompt Model response Determine the city along with the landmark. The landmark is the Christ the Redeemer statue in Rio de Janeiro, Brazil. Updated prompt Improved response Determine the city along with the landmark. city: Rome, landmark: the Colosseum. city: Beijing, landmark: Forbidden City city: Rio de Janeiro, landmark: Christ the Redeemer statue Break it down step-by-step For complex tasks like the ones that require both visual understanding and reasoning, it can be helpful to split the task into smaller, more straightforward steps. Alternatively, it could also be effective if you directly ask the model to “think step by step” in your prompt. Prompt Model response When will I run out of toilet paper? Soon, you only have 3 rolls left. Updated prompt Improved response 1. First, count how many toilet paper rolls are in this picture. 2. Then, determine how much toilet paper a typical person uses per day. 3. Calculate how long these rolls of toilet paper will last. 1. There are 3 rolls of toilet paper in this picture. 2. A typical person uses about 20 sheets of toilet paper per day. 3. If each roll contains 200 sheets, then each roll will last for about 10 days. Therefore, the 3 rolls will last for about a month. Math problems or other types of word problems are great candidates for asking the model to think step-by-step. Prompt Response What is the 4th term in the sequence? -135 The response from the model is incorrect. Some ways to improve this is to ask the model to parse the problem before answering the prompt. Prompt Response Parse the formula in the image first. Then based on the formula, answer "what's the 4th term in the sequence?". Click to expand the result $$ b(n) = b(n-1) * (-3) $$ $$ b(1) = 15 $$ Substitute n = 4 into the formula: $$ b(4) = b(4-1) * (-3) $$ $$ b(4) = b(3) * (-3) $$ We don't know the value of b(3), so substitute n = 3 into the formula: $$ b(3) = b(3-1) * (-3) $$ $$ b(3) = b(2) * (-3) $$ We don't know the value of b(2), so substitute n = 2 into the formula: $$ b(2) = b(2-1) * (-3) $$ $$ b(2) = b(1) * (-3) $$ We know the value of b(1) is 15, so substitute it into the formula: $$ b(2) = 15 * (-3) $$ $$ b(2) = -45 $$ Now that we know the value of b(2), substitute it into the formula for b(3): $$ b(3) = b(2) * (-3) $$ $$ b(3) = -45 * (-3) $$ $$ b(3) = 135 $$ Now that we know the value of b(3), substitute it into the formula for b(4): $$ b(4) = b(3) * (-3) $$ $$ b(4) = 135 * (-3) $$ $$ b(4) = -405 $$ Therefore, the 4th term in the sequence is -405. This model response is correct - now to make the response even more streamlined, you can ask the model to to think step-by-step. Updated prompt Improved response What is the 4th term in the sequence? Think step by step. The first term is 15. The second term is b(1) * (-3) = 15 * (-3) = -45. The third term is b(2) * (-3) = -45 * (-3) = 135. The fourth term is b(3) * (-3) = 135 * (-3) = -405. Specify the output format A common problem is the need for model output to be in a certain format (e.g. markdown, JSON, HTML), especially if the model output needs to be ingested by a downstream task. You can try to produce output in that format by instructing the model to do so within the prompt. Below are two examples: Prompt Model response Parse the table in this image into markdown format | Subject | Status | Time | Start date | Due on | | ------- | ------ | ---- | ---------- | ------ | | Math | Done | 1.5 hrs | 11/29/2023 | 12/1/2023 | | Geography | Done | 30 min | 12/2/2023 | 12/4/2023 | | Physics | In progress | 2 hrs | 12/3/2023 | 12/5/2023 | | Biology | Not started | 6 hrs | 11/2/2016 | 12/6/2023 | Prompt Model response Provide a list of all the following attributes: ingredients, type of cuisine, vegetarian or not, in JSON format ```json { "ingredients": [ "rice", "shrimp", "clams", "mussels", "peas", "tomatoes", "onions", "garlic", "olive oil", "paprika", "salt", "pepper" ], "type of cuisine": "Spanish", "vegetarian": false } ``` Put your image first for single-image prompts While Gemini can interpret image and text in any order within a prompt, placing a single image before the text prompt might lead to better results. In the following samples, you'll notice the image comes first before the text in each prompt. Troubleshooting your multimodal prompt You might need to troubleshoot your prompt if you are not getting a helpful response. Here are a few strategies you could try. If the model is not drawing information from the relevant part of the image To get a more specific response, you can point out which aspects of the image you want the prompt to draw information from to inform its response. Prompt Model response How many days will these diapers last a baby? The diapers will last for 198 days before they run out. Updated prompt Improved response How long will these diapers last before I run out? Use the weight shown on the box to determine the child's age, and use the total number of diapers in the box. Divide the total number by how many diapers the child goes through per day. The box contains 198 diapers. The box states the diapers are for a child weighing 22-37 lbs. The average child in this weight range is likely between 1 and 2 years old. A 2-year-old typically goes through 6 diapers per day, therefore, the diapers will last around 33 days (198/6). If the model output is too generic and not tailored enough to the image input To help the model tailor its response to the image(s), try asking it to describe the images before performing its reasoning task. Prompt Model response What is in common between these images? All four images are examples of landscape photography. Updated prompt Improved response First, describe what's in each image in detail. What's in common between these images? The first image shows a person packing a suitcase. The second image shows an empty hallway in a shopping mall. The third image shows a roll of toilet paper. The fourth image shows a dark and stormy sea. The commonality between these images is that they are all examples of liminal spaces. Liminal spaces are places or situations that are in between two other places or situations. They are often characterized by a sense of uncertainty or ambiguity. The first image shows the person in between being at home and being on a trip. The second image shows the hallway in between two stores. The third image is of toilet paper, which can represent a bathroom. A bathroom is a transitional space between private and public realms. It's a place where we prepare ourselves to enter or leave the outside world. The fourth image shows the sea in between being calm and being stormy. Liminal spaces can be both exciting and unsettling. They can be a place of new beginning or a place of endings. They can be a place of reflection or a place of action. You can also try explicitly asking the model to refer to what's in the image. Updated prompt Improved response What is in common between these images? Refer to what's in the images in your response. All of the images are related to travel. The first image shows a person packing a suitcase, the second image shows an empty hallway in a shopping mall, the third image shows a stack of toilet paper, and the fourth image shows a dark and stormy sea. These images are all related to travel because they are all things that people need or use when they travel. The first image shows a person packing a suitcase, which is something that people do when they are going on a trip. The second image shows an empty hallway in a shopping mall, which is a place where people often go to buy things for their trips. The third image shows a stack of toilet paper, which is something that people often use when they are traveling. The fourth image shows a dark and stormy sea, which is something that people often encounter when they are traveling. The same tips work for creative output from images. In the following example with the rabbit, explicitly asking the model to include aspects from the image can improve the output. Prompt Model response Can you write me a poem about this image? Little bunny, so soft and brown, Hopping through the field, up and down. Your ears so long, your tail so small, You're the cutest creature of them all. You nibble on the grass, so green and sweet, And hop away when you see me. But I'll always be your friend, Little bunny, until the very end. Updated prompt Improved response Can you write me a descriptive and dramatic poem about this image and include the location? Little bunny on the green, Ears so long, so keen. What do you see out there, In the distance, where you stare? Is it the ocean blue, Sparkling in the sun's hue? Or the waves that crash and roar, A rhythm that you adore? Little bunny, tell me true, What captures your attention so? Is it the salty breeze, Or the seagulls' cries that please? Perhaps it's the vastness of it all, A world so big and small. A place where dreams can roam, And adventures can be sown. Little bunny, I wonder what you dream, As you sit on the grass, so serene. Do you yearn to explore the deep, Or stay on land, where you can leap? Whatever it may be, little bunny, Keep that spark of wonder burning brightly. For in your dreams and aspirations, Lies a world waiting for your creation. Troubleshooting which part of the prompt failed It can be hard to know whether a prompt failed because the model didn't understand the image to begin with, or if it did understand the image but did not perform the correct reasoning steps afterward. To disambiguate between those reasons, ask the model to describe what's in the image. In the following example, if the model responds with a snack that seems surprising when paired with tea (e.g. popcorn), you can first troubleshoot to determine whether the model correctly recognized that the image contains tea. Prompt Prompt for troubleshooting What's a snack I can make in 1 minute that would go well with this? Describe what's in this image. Another strategy is to ask the model to explain its reasoning. That can help you narrow down which part of the reasoning broke down, if any. Prompt Prompt for troubleshooting What's a snack I can make in 1 minute that would go well with this? What's a snack I can make in 1 minute that would go well with this? Please explain why. What's next Try writing your own multimodal prompts using Google AI Studio . For information on using the Gemini Files API for uploading media files and including them in your prompts, see the Vision , Audio , and Document processing guides. For more guidance on prompt design, like tuning sampling parameters, see the Prompt strategies page. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_files_2dd3ac4c.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/files#troubleshooting
2
  Title: Files API  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Files API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Files API The Gemini family of artificial intelligence (AI) models is built to handle various types of input data, including text, images, and audio. Since these models can handle more than one type or mode of data, the Gemini models are called multimodal models or explained as having multimodal capabilities . This guide shows you how to work with media files using the Files API. The basic operations are the same for audio files, images, videos, documents, and other supported file types. For file prompting guidance, check out the File prompt guide section. Upload a file You can use the Files API to upload a media file. Always use the Files API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20 MB. The following code uploads a file and then uses the file in a call to generateContent . Python from google import genai client = genai . Client () myfile = client . files . upload ( file = "path/to/sample.mp3" ) response = client . models . generate_content ( model = "gemini-2.0-flash" , contents = [ "Describe this audio clip" , myfile ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); async function main () { const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const response = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Describe this audio clip" , ]), }); console . log ( response . text ); } await main (); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } defer client . DeleteFile ( ctx , file . Name ) model := client . GenerativeModel ( "gemini-2.0-flash" ) resp , err := model . GenerateContent ( ctx , genai . FileData { URI : file . URI }, genai . Text ( "Describe this audio clip" )) if err != nil { log . Fatal ( err ) } printResponse ( resp ) REST AUDIO_PATH = "path/to/sample.mp3" MIME_TYPE = $( file -b --mime-type " ${ AUDIO_PATH } " ) NUM_BYTES = $( wc -c < " ${ AUDIO_PATH } " ) DISPLAY_NAME = AUDIO tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl " ${ BASE_URL } /upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D " ${ tmp_header_file } " \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ AUDIO_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "Describe this audio clip"}, {"file_data":{"mime_type": "${MIME_TYPE}", "file_uri": ' $file_uri '}}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Get metadata for a file You can verify that the API successfully stored the uploaded file and get its metadata by calling files.get . Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) file_name = myfile . name myfile = client . files . get ( name = file_name ) print ( myfile ) JavaScript const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const fileName = myfile . name ; const fetchedFile = await ai . files . get ({ name : fileName }); console . log ( fetchedFile ); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } gotFile , err := client . GetFile ( ctx , file . Name ) if err != nil { log . Fatal ( err ) } fmt . Println ( "Got file:" , gotFile . Name ) REST # file_info.json was created in the upload example name = $( jq ".file.name" file_info.json ) # Get the file of interest to check state curl https://generativelanguage.googleapis.com/v1beta/files/ $name \ -H "x-goog-api-key: $GEMINI_API_KEY " > file_info.json # Print some information about the file you got name = $( jq ".file.name" file_info.json ) echo name = $name file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri List uploaded files You can upload multiple files using the Files API. The following code gets a list of all the files uploaded: Python print ( 'My files:' ) for f in client . files . list (): print ( ' ' , f . name ) JavaScript const listResponse = await ai . files . list ({ config : { pageSize : 10 } }); for await ( const file of listResponse ) { console . log ( file . name ); } Go iter := client . ListFiles ( ctx ) for { ifile , err := iter . Next () if err == iterator . Done { break } if err != nil { log . Fatal ( err ) } fmt . Println ( ifile . Name ) } REST echo "My files: " curl "https://generativelanguage.googleapis.com/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " Delete uploaded files Files are automatically deleted after 48 hours. You can also manually delete an uploaded file: Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) client . files . delete ( name = myfile . name ) JavaScript const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const fileName = myfile . name ; await ai . files . delete ({ name : fileName }); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } client . DeleteFile ( ctx , file . Name ) REST curl --request "DELETE" https://generativelanguage.googleapis.com/v1beta/files/ $name \ -H "x-goog-api-key: $GEMINI_API_KEY " Usage info You can use the Files API to upload and interact with media files. The Files API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB. Files are stored for 48 hours. During that time, you can use the API to get metadata about the files, but you can't download the files. The Files API is available at no cost in all regions where the Gemini API is available. File prompting strategies This section provides guidance and best practices for using media files with prompts for the Gemini API. Being able to use various types of data in your prompts gives you more flexibility in terms of what tasks you can tackle with the Gemini API. For example, you can send the model a photo of a delicious meal and ask it to write a short blog about the meal. Prompt Response Write a short, engaging blog post based on this picture. It should include a description of the meal in the photo and talk about my journey meal prepping. Meal prepping is a great way to save time and money, and it can also help you to eat healthier. This meal is a great example of a healthy and delicious meal that can be easily prepped ahead of time. This meal features brown rice, roasted vegetables, and chicken teriyaki. The brown rice is a healthy and complex carbohydrate that will give you sustained energy. The roasted vegetables are a great way to get your daily dose of vitamins and minerals, and the chicken teriyaki is a delicious and protein-rich option. This meal is also very easy to prepare. Simply cook the brown rice, roast the vegetables, and cook the chicken teriyaki. Once everything is cooked, divide it into meal prep containers and store them in the refrigerator. You can then grab a container and go on busy mornings! If you're looking for a healthy and delicious meal that can be easily prepped ahead of time, this meal is a great option. It's packed with nutrients and flavor, and it's sure to keep you feeling full and satisfied. Here's to healthy and delicious meal prepping! If you are having trouble getting the output you want from prompts that use media files, there are some strategies that can help you get the results you want. The following sections provide design approaches and troubleshooting tips for improving prompts that use multimodal input. You can improve your multimodal prompts by following these best practices: Prompt design fundamentals Be specific in your instructions : Craft clear and concise instructions that leave minimal room for misinterpretation. Add a few examples to your prompt: Use realistic few-shot examples to illustrate what you want to achieve. Break it down step-by-step : Divide complex tasks into manageable sub-goals, guiding the model through the process. Specify the output format : In your prompt, ask for the output to be in the format you want, like markdown, JSON, HTML and more. Put your image first for single-image prompts : While Gemini can handle image and text inputs in any order, for prompts containing a single image, it might perform better if that image (or video) is placed before the text prompt. However, for prompts that require images to be highly interleaved with texts to make sense, use whatever order is most natural. Troubleshooting your multimodal prompt If the model is not drawing information from the relevant part of the image: Drop hints with which aspects of the image you want the prompt to draw information from. If the model output is too generic (not tailored enough to the image/video input): At the start of the prompt, try asking the model to describe the image(s) or video before providing the task instruction, or try asking the model to refer to what's in the image. To troubleshoot which part failed: Ask the model to describe the image, or ask the model to explain its reasoning, to gauge the model's initial understanding. If your prompt results in hallucinated content: Try dialing down the temperature setting or asking the model for shorter descriptions so that it's less likely to extrapolate additional details. Tuning the sampling parameters: Experiment with different temperature settings and top-k selections to adjust the model's creativity. Be specific in your instructions Prompts have the most success when they are clear and detailed. If you have a specific output in mind, it's better to include that requirement in the prompt to ensure you get the output you want. For this image of an airport board, asking the model to just "describe this image" could generate a general description. If you need the model to parse the time and city from the image, you can include that request directly in your prompt. Prompt Model response Describe this image. The image shows an airport arrivals and departures board. Updated prompt Improved response Parse the time and city from the airport board shown in this image into a list. 10:50 Moscow 11:05 Edinburgh 11:05 London 11:10 Bucharest 11:30 Kiev 11:35 Dublin 11:45 East Midlands 12:15 Sofia 12:30 London 12:30 Newcastle 12:40 St Petersburg 12:40 London 12:45 Manchester Add a few examples The Gemini model can accept multiple inputs which it can use as examples to understand the output you want. Adding these examples can help the model identify the patterns and apply the relationship between the given images and responses to the new example. This is also called "few-shot" learning. In the example below, the initial output is written in sentence form, and also contains the country (Brazil). Suppose you want the output in a different format or style, and you want only the city, not the country. Adding few-shot examples to your prompt can steer the model to respond in the way you want. Prompt Model response Determine the city along with the landmark. The landmark is the Christ the Redeemer statue in Rio de Janeiro, Brazil. Updated prompt Improved response Determine the city along with the landmark. city: Rome, landmark: the Colosseum. city: Beijing, landmark: Forbidden City city: Rio de Janeiro, landmark: Christ the Redeemer statue Break it down step-by-step For complex tasks like the ones that require both visual understanding and reasoning, it can be helpful to split the task into smaller, more straightforward steps. Alternatively, it could also be effective if you directly ask the model to “think step by step” in your prompt. Prompt Model response When will I run out of toilet paper? Soon, you only have 3 rolls left. Updated prompt Improved response 1. First, count how many toilet paper rolls are in this picture. 2. Then, determine how much toilet paper a typical person uses per day. 3. Calculate how long these rolls of toilet paper will last. 1. There are 3 rolls of toilet paper in this picture. 2. A typical person uses about 20 sheets of toilet paper per day. 3. If each roll contains 200 sheets, then each roll will last for about 10 days. Therefore, the 3 rolls will last for about a month. Math problems or other types of word problems are great candidates for asking the model to think step-by-step. Prompt Response What is the 4th term in the sequence? -135 The response from the model is incorrect. Some ways to improve this is to ask the model to parse the problem before answering the prompt. Prompt Response Parse the formula in the image first. Then based on the formula, answer "what's the 4th term in the sequence?". Click to expand the result $$ b(n) = b(n-1) * (-3) $$ $$ b(1) = 15 $$ Substitute n = 4 into the formula: $$ b(4) = b(4-1) * (-3) $$ $$ b(4) = b(3) * (-3) $$ We don't know the value of b(3), so substitute n = 3 into the formula: $$ b(3) = b(3-1) * (-3) $$ $$ b(3) = b(2) * (-3) $$ We don't know the value of b(2), so substitute n = 2 into the formula: $$ b(2) = b(2-1) * (-3) $$ $$ b(2) = b(1) * (-3) $$ We know the value of b(1) is 15, so substitute it into the formula: $$ b(2) = 15 * (-3) $$ $$ b(2) = -45 $$ Now that we know the value of b(2), substitute it into the formula for b(3): $$ b(3) = b(2) * (-3) $$ $$ b(3) = -45 * (-3) $$ $$ b(3) = 135 $$ Now that we know the value of b(3), substitute it into the formula for b(4): $$ b(4) = b(3) * (-3) $$ $$ b(4) = 135 * (-3) $$ $$ b(4) = -405 $$ Therefore, the 4th term in the sequence is -405. This model response is correct - now to make the response even more streamlined, you can ask the model to to think step-by-step. Updated prompt Improved response What is the 4th term in the sequence? Think step by step. The first term is 15. The second term is b(1) * (-3) = 15 * (-3) = -45. The third term is b(2) * (-3) = -45 * (-3) = 135. The fourth term is b(3) * (-3) = 135 * (-3) = -405. Specify the output format A common problem is the need for model output to be in a certain format (e.g. markdown, JSON, HTML), especially if the model output needs to be ingested by a downstream task. You can try to produce output in that format by instructing the model to do so within the prompt. Below are two examples: Prompt Model response Parse the table in this image into markdown format | Subject | Status | Time | Start date | Due on | | ------- | ------ | ---- | ---------- | ------ | | Math | Done | 1.5 hrs | 11/29/2023 | 12/1/2023 | | Geography | Done | 30 min | 12/2/2023 | 12/4/2023 | | Physics | In progress | 2 hrs | 12/3/2023 | 12/5/2023 | | Biology | Not started | 6 hrs | 11/2/2016 | 12/6/2023 | Prompt Model response Provide a list of all the following attributes: ingredients, type of cuisine, vegetarian or not, in JSON format ```json { "ingredients": [ "rice", "shrimp", "clams", "mussels", "peas", "tomatoes", "onions", "garlic", "olive oil", "paprika", "salt", "pepper" ], "type of cuisine": "Spanish", "vegetarian": false } ``` Put your image first for single-image prompts While Gemini can interpret image and text in any order within a prompt, placing a single image before the text prompt might lead to better results. In the following samples, you'll notice the image comes first before the text in each prompt. Troubleshooting your multimodal prompt You might need to troubleshoot your prompt if you are not getting a helpful response. Here are a few strategies you could try. If the model is not drawing information from the relevant part of the image To get a more specific response, you can point out which aspects of the image you want the prompt to draw information from to inform its response. Prompt Model response How many days will these diapers last a baby? The diapers will last for 198 days before they run out. Updated prompt Improved response How long will these diapers last before I run out? Use the weight shown on the box to determine the child's age, and use the total number of diapers in the box. Divide the total number by how many diapers the child goes through per day. The box contains 198 diapers. The box states the diapers are for a child weighing 22-37 lbs. The average child in this weight range is likely between 1 and 2 years old. A 2-year-old typically goes through 6 diapers per day, therefore, the diapers will last around 33 days (198/6). If the model output is too generic and not tailored enough to the image input To help the model tailor its response to the image(s), try asking it to describe the images before performing its reasoning task. Prompt Model response What is in common between these images? All four images are examples of landscape photography. Updated prompt Improved response First, describe what's in each image in detail. What's in common between these images? The first image shows a person packing a suitcase. The second image shows an empty hallway in a shopping mall. The third image shows a roll of toilet paper. The fourth image shows a dark and stormy sea. The commonality between these images is that they are all examples of liminal spaces. Liminal spaces are places or situations that are in between two other places or situations. They are often characterized by a sense of uncertainty or ambiguity. The first image shows the person in between being at home and being on a trip. The second image shows the hallway in between two stores. The third image is of toilet paper, which can represent a bathroom. A bathroom is a transitional space between private and public realms. It's a place where we prepare ourselves to enter or leave the outside world. The fourth image shows the sea in between being calm and being stormy. Liminal spaces can be both exciting and unsettling. They can be a place of new beginning or a place of endings. They can be a place of reflection or a place of action. You can also try explicitly asking the model to refer to what's in the image. Updated prompt Improved response What is in common between these images? Refer to what's in the images in your response. All of the images are related to travel. The first image shows a person packing a suitcase, the second image shows an empty hallway in a shopping mall, the third image shows a stack of toilet paper, and the fourth image shows a dark and stormy sea. These images are all related to travel because they are all things that people need or use when they travel. The first image shows a person packing a suitcase, which is something that people do when they are going on a trip. The second image shows an empty hallway in a shopping mall, which is a place where people often go to buy things for their trips. The third image shows a stack of toilet paper, which is something that people often use when they are traveling. The fourth image shows a dark and stormy sea, which is something that people often encounter when they are traveling. The same tips work for creative output from images. In the following example with the rabbit, explicitly asking the model to include aspects from the image can improve the output. Prompt Model response Can you write me a poem about this image? Little bunny, so soft and brown, Hopping through the field, up and down. Your ears so long, your tail so small, You're the cutest creature of them all. You nibble on the grass, so green and sweet, And hop away when you see me. But I'll always be your friend, Little bunny, until the very end. Updated prompt Improved response Can you write me a descriptive and dramatic poem about this image and include the location? Little bunny on the green, Ears so long, so keen. What do you see out there, In the distance, where you stare? Is it the ocean blue, Sparkling in the sun's hue? Or the waves that crash and roar, A rhythm that you adore? Little bunny, tell me true, What captures your attention so? Is it the salty breeze, Or the seagulls' cries that please? Perhaps it's the vastness of it all, A world so big and small. A place where dreams can roam, And adventures can be sown. Little bunny, I wonder what you dream, As you sit on the grass, so serene. Do you yearn to explore the deep, Or stay on land, where you can leap? Whatever it may be, little bunny, Keep that spark of wonder burning brightly. For in your dreams and aspirations, Lies a world waiting for your creation. Troubleshooting which part of the prompt failed It can be hard to know whether a prompt failed because the model didn't understand the image to begin with, or if it did understand the image but did not perform the correct reasoning steps afterward. To disambiguate between those reasons, ask the model to describe what's in the image. In the following example, if the model responds with a snack that seems surprising when paired with tea (e.g. popcorn), you can first troubleshoot to determine whether the model correctly recognized that the image contains tea. Prompt Prompt for troubleshooting What's a snack I can make in 1 minute that would go well with this? Describe what's in this image. Another strategy is to ask the model to explain its reasoning. That can help you narrow down which part of the reasoning broke down, if any. Prompt Prompt for troubleshooting What's a snack I can make in 1 minute that would go well with this? What's a snack I can make in 1 minute that would go well with this? Please explain why. What's next Try writing your own multimodal prompts using Google AI Studio . For information on using the Gemini Files API for uploading media files and including them in your prompts, see the Vision , Audio , and Document processing guides. For more guidance on prompt design, like tuning sampling parameters, see the Prompt strategies page. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
 
2
  Title: Files API  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Files API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Files API The Gemini family of artificial intelligence (AI) models is built to handle various types of input data, including text, images, and audio. Since these models can handle more than one type or mode of data, the Gemini models are called multimodal models or explained as having multimodal capabilities . This guide shows you how to work with media files using the Files API. The basic operations are the same for audio files, images, videos, documents, and other supported file types. For file prompting guidance, check out the File prompt guide section. Upload a file You can use the Files API to upload a media file. Always use the Files API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20 MB. The following code uploads a file and then uses the file in a call to generateContent . Python from google import genai client = genai . Client () myfile = client . files . upload ( file = "path/to/sample.mp3" ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ "Describe this audio clip" , myfile ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); async function main () { const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Describe this audio clip" , ]), }); console . log ( response . text ); } await main (); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } defer client . DeleteFile ( ctx , file . Name ) model := client . GenerativeModel ( "gemini-2.5-flash" ) resp , err := model . GenerateContent ( ctx , genai . FileData { URI : file . URI }, genai . Text ( "Describe this audio clip" )) if err != nil { log . Fatal ( err ) } printResponse ( resp ) REST AUDIO_PATH = "path/to/sample.mp3" MIME_TYPE = $( file -b --mime-type " ${ AUDIO_PATH } " ) NUM_BYTES = $( wc -c < " ${ AUDIO_PATH } " ) DISPLAY_NAME = AUDIO tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl " ${ BASE_URL } /upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D " ${ tmp_header_file } " \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ AUDIO_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "Describe this audio clip"}, {"file_data":{"mime_type": "${MIME_TYPE}", "file_uri": ' $file_uri '}}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Get metadata for a file You can verify that the API successfully stored the uploaded file and get its metadata by calling files.get . Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) file_name = myfile . name myfile = client . files . get ( name = file_name ) print ( myfile ) JavaScript const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const fileName = myfile . name ; const fetchedFile = await ai . files . get ({ name : fileName }); console . log ( fetchedFile ); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } gotFile , err := client . GetFile ( ctx , file . Name ) if err != nil { log . Fatal ( err ) } fmt . Println ( "Got file:" , gotFile . Name ) REST # file_info.json was created in the upload example name = $( jq ".file.name" file_info.json ) # Get the file of interest to check state curl https://generativelanguage.googleapis.com/v1beta/files/ $name \ -H "x-goog-api-key: $GEMINI_API_KEY " > file_info.json # Print some information about the file you got name = $( jq ".file.name" file_info.json ) echo name = $name file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri List uploaded files You can upload multiple files using the Files API. The following code gets a list of all the files uploaded: Python print ( 'My files:' ) for f in client . files . list (): print ( ' ' , f . name ) JavaScript const listResponse = await ai . files . list ({ config : { pageSize : 10 } }); for await ( const file of listResponse ) { console . log ( file . name ); } Go iter := client . ListFiles ( ctx ) for { ifile , err := iter . Next () if err == iterator . Done { break } if err != nil { log . Fatal ( err ) } fmt . Println ( ifile . Name ) } REST echo "My files: " curl "https://generativelanguage.googleapis.com/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " Delete uploaded files Files are automatically deleted after 48 hours. You can also manually delete an uploaded file: Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) client . files . delete ( name = myfile . name ) JavaScript const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const fileName = myfile . name ; await ai . files . delete ({ name : fileName }); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } client . DeleteFile ( ctx , file . Name ) REST curl --request "DELETE" https://generativelanguage.googleapis.com/v1beta/files/ $name \ -H "x-goog-api-key: $GEMINI_API_KEY " Usage info You can use the Files API to upload and interact with media files. The Files API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB. Files are stored for 48 hours. During that time, you can use the API to get metadata about the files, but you can't download the files. The Files API is available at no cost in all regions where the Gemini API is available. File prompting strategies This section provides guidance and best practices for using media files with prompts for the Gemini API. Being able to use various types of data in your prompts gives you more flexibility in terms of what tasks you can tackle with the Gemini API. For example, you can send the model a photo of a delicious meal and ask it to write a short blog about the meal. Prompt Response Write a short, engaging blog post based on this picture. It should include a description of the meal in the photo and talk about my journey meal prepping. Meal prepping is a great way to save time and money, and it can also help you to eat healthier. This meal is a great example of a healthy and delicious meal that can be easily prepped ahead of time. This meal features brown rice, roasted vegetables, and chicken teriyaki. The brown rice is a healthy and complex carbohydrate that will give you sustained energy. The roasted vegetables are a great way to get your daily dose of vitamins and minerals, and the chicken teriyaki is a delicious and protein-rich option. This meal is also very easy to prepare. Simply cook the brown rice, roast the vegetables, and cook the chicken teriyaki. Once everything is cooked, divide it into meal prep containers and store them in the refrigerator. You can then grab a container and go on busy mornings! If you're looking for a healthy and delicious meal that can be easily prepped ahead of time, this meal is a great option. It's packed with nutrients and flavor, and it's sure to keep you feeling full and satisfied. Here's to healthy and delicious meal prepping! If you are having trouble getting the output you want from prompts that use media files, there are some strategies that can help you get the results you want. The following sections provide design approaches and troubleshooting tips for improving prompts that use multimodal input. You can improve your multimodal prompts by following these best practices: Prompt design fundamentals Be specific in your instructions : Craft clear and concise instructions that leave minimal room for misinterpretation. Add a few examples to your prompt: Use realistic few-shot examples to illustrate what you want to achieve. Break it down step-by-step : Divide complex tasks into manageable sub-goals, guiding the model through the process. Specify the output format : In your prompt, ask for the output to be in the format you want, like markdown, JSON, HTML and more. Put your image first for single-image prompts : While Gemini can handle image and text inputs in any order, for prompts containing a single image, it might perform better if that image (or video) is placed before the text prompt. However, for prompts that require images to be highly interleaved with texts to make sense, use whatever order is most natural. Troubleshooting your multimodal prompt If the model is not drawing information from the relevant part of the image: Drop hints with which aspects of the image you want the prompt to draw information from. If the model output is too generic (not tailored enough to the image/video input): At the start of the prompt, try asking the model to describe the image(s) or video before providing the task instruction, or try asking the model to refer to what's in the image. To troubleshoot which part failed: Ask the model to describe the image, or ask the model to explain its reasoning, to gauge the model's initial understanding. If your prompt results in hallucinated content: Try dialing down the temperature setting or asking the model for shorter descriptions so that it's less likely to extrapolate additional details. Tuning the sampling parameters: Experiment with different temperature settings and top-k selections to adjust the model's creativity. Be specific in your instructions Prompts have the most success when they are clear and detailed. If you have a specific output in mind, it's better to include that requirement in the prompt to ensure you get the output you want. For this image of an airport board, asking the model to just "describe this image" could generate a general description. If you need the model to parse the time and city from the image, you can include that request directly in your prompt. Prompt Model response Describe this image. The image shows an airport arrivals and departures board. Updated prompt Improved response Parse the time and city from the airport board shown in this image into a list. 10:50 Moscow 11:05 Edinburgh 11:05 London 11:10 Bucharest 11:30 Kiev 11:35 Dublin 11:45 East Midlands 12:15 Sofia 12:30 London 12:30 Newcastle 12:40 St Petersburg 12:40 London 12:45 Manchester Add a few examples The Gemini model can accept multiple inputs which it can use as examples to understand the output you want. Adding these examples can help the model identify the patterns and apply the relationship between the given images and responses to the new example. This is also called "few-shot" learning. In the example below, the initial output is written in sentence form, and also contains the country (Brazil). Suppose you want the output in a different format or style, and you want only the city, not the country. Adding few-shot examples to your prompt can steer the model to respond in the way you want. Prompt Model response Determine the city along with the landmark. The landmark is the Christ the Redeemer statue in Rio de Janeiro, Brazil. Updated prompt Improved response Determine the city along with the landmark. city: Rome, landmark: the Colosseum. city: Beijing, landmark: Forbidden City city: Rio de Janeiro, landmark: Christ the Redeemer statue Break it down step-by-step For complex tasks like the ones that require both visual understanding and reasoning, it can be helpful to split the task into smaller, more straightforward steps. Alternatively, it could also be effective if you directly ask the model to “think step by step” in your prompt. Prompt Model response When will I run out of toilet paper? Soon, you only have 3 rolls left. Updated prompt Improved response 1. First, count how many toilet paper rolls are in this picture. 2. Then, determine how much toilet paper a typical person uses per day. 3. Calculate how long these rolls of toilet paper will last. 1. There are 3 rolls of toilet paper in this picture. 2. A typical person uses about 20 sheets of toilet paper per day. 3. If each roll contains 200 sheets, then each roll will last for about 10 days. Therefore, the 3 rolls will last for about a month. Math problems or other types of word problems are great candidates for asking the model to think step-by-step. Prompt Response What is the 4th term in the sequence? -135 The response from the model is incorrect. Some ways to improve this is to ask the model to parse the problem before answering the prompt. Prompt Response Parse the formula in the image first. Then based on the formula, answer "what's the 4th term in the sequence?". Click to expand the result $$ b(n) = b(n-1) * (-3) $$ $$ b(1) = 15 $$ Substitute n = 4 into the formula: $$ b(4) = b(4-1) * (-3) $$ $$ b(4) = b(3) * (-3) $$ We don't know the value of b(3), so substitute n = 3 into the formula: $$ b(3) = b(3-1) * (-3) $$ $$ b(3) = b(2) * (-3) $$ We don't know the value of b(2), so substitute n = 2 into the formula: $$ b(2) = b(2-1) * (-3) $$ $$ b(2) = b(1) * (-3) $$ We know the value of b(1) is 15, so substitute it into the formula: $$ b(2) = 15 * (-3) $$ $$ b(2) = -45 $$ Now that we know the value of b(2), substitute it into the formula for b(3): $$ b(3) = b(2) * (-3) $$ $$ b(3) = -45 * (-3) $$ $$ b(3) = 135 $$ Now that we know the value of b(3), substitute it into the formula for b(4): $$ b(4) = b(3) * (-3) $$ $$ b(4) = 135 * (-3) $$ $$ b(4) = -405 $$ Therefore, the 4th term in the sequence is -405. This model response is correct - now to make the response even more streamlined, you can ask the model to to think step-by-step. Updated prompt Improved response What is the 4th term in the sequence? Think step by step. The first term is 15. The second term is b(1) * (-3) = 15 * (-3) = -45. The third term is b(2) * (-3) = -45 * (-3) = 135. The fourth term is b(3) * (-3) = 135 * (-3) = -405. Specify the output format A common problem is the need for model output to be in a certain format (e.g. markdown, JSON, HTML), especially if the model output needs to be ingested by a downstream task. You can try to produce output in that format by instructing the model to do so within the prompt. Below are two examples: Prompt Model response Parse the table in this image into markdown format | Subject | Status | Time | Start date | Due on | | ------- | ------ | ---- | ---------- | ------ | | Math | Done | 1.5 hrs | 11/29/2023 | 12/1/2023 | | Geography | Done | 30 min | 12/2/2023 | 12/4/2023 | | Physics | In progress | 2 hrs | 12/3/2023 | 12/5/2023 | | Biology | Not started | 6 hrs | 11/2/2016 | 12/6/2023 | Prompt Model response Provide a list of all the following attributes: ingredients, type of cuisine, vegetarian or not, in JSON format ```json { "ingredients": [ "rice", "shrimp", "clams", "mussels", "peas", "tomatoes", "onions", "garlic", "olive oil", "paprika", "salt", "pepper" ], "type of cuisine": "Spanish", "vegetarian": false } ``` Put your image first for single-image prompts While Gemini can interpret image and text in any order within a prompt, placing a single image before the text prompt might lead to better results. In the following samples, you'll notice the image comes first before the text in each prompt. Troubleshooting your multimodal prompt You might need to troubleshoot your prompt if you are not getting a helpful response. Here are a few strategies you could try. If the model is not drawing information from the relevant part of the image To get a more specific response, you can point out which aspects of the image you want the prompt to draw information from to inform its response. Prompt Model response How many days will these diapers last a baby? The diapers will last for 198 days before they run out. Updated prompt Improved response How long will these diapers last before I run out? Use the weight shown on the box to determine the child's age, and use the total number of diapers in the box. Divide the total number by how many diapers the child goes through per day. The box contains 198 diapers. The box states the diapers are for a child weighing 22-37 lbs. The average child in this weight range is likely between 1 and 2 years old. A 2-year-old typically goes through 6 diapers per day, therefore, the diapers will last around 33 days (198/6). If the model output is too generic and not tailored enough to the image input To help the model tailor its response to the image(s), try asking it to describe the images before performing its reasoning task. Prompt Model response What is in common between these images? All four images are examples of landscape photography. Updated prompt Improved response First, describe what's in each image in detail. What's in common between these images? The first image shows a person packing a suitcase. The second image shows an empty hallway in a shopping mall. The third image shows a roll of toilet paper. The fourth image shows a dark and stormy sea. The commonality between these images is that they are all examples of liminal spaces. Liminal spaces are places or situations that are in between two other places or situations. They are often characterized by a sense of uncertainty or ambiguity. The first image shows the person in between being at home and being on a trip. The second image shows the hallway in between two stores. The third image is of toilet paper, which can represent a bathroom. A bathroom is a transitional space between private and public realms. It's a place where we prepare ourselves to enter or leave the outside world. The fourth image shows the sea in between being calm and being stormy. Liminal spaces can be both exciting and unsettling. They can be a place of new beginning or a place of endings. They can be a place of reflection or a place of action. You can also try explicitly asking the model to refer to what's in the image. Updated prompt Improved response What is in common between these images? Refer to what's in the images in your response. All of the images are related to travel. The first image shows a person packing a suitcase, the second image shows an empty hallway in a shopping mall, the third image shows a stack of toilet paper, and the fourth image shows a dark and stormy sea. These images are all related to travel because they are all things that people need or use when they travel. The first image shows a person packing a suitcase, which is something that people do when they are going on a trip. The second image shows an empty hallway in a shopping mall, which is a place where people often go to buy things for their trips. The third image shows a stack of toilet paper, which is something that people often use when they are traveling. The fourth image shows a dark and stormy sea, which is something that people often encounter when they are traveling. The same tips work for creative output from images. In the following example with the rabbit, explicitly asking the model to include aspects from the image can improve the output. Prompt Model response Can you write me a poem about this image? Little bunny, so soft and brown, Hopping through the field, up and down. Your ears so long, your tail so small, You're the cutest creature of them all. You nibble on the grass, so green and sweet, And hop away when you see me. But I'll always be your friend, Little bunny, until the very end. Updated prompt Improved response Can you write me a descriptive and dramatic poem about this image and include the location? Little bunny on the green, Ears so long, so keen. What do you see out there, In the distance, where you stare? Is it the ocean blue, Sparkling in the sun's hue? Or the waves that crash and roar, A rhythm that you adore? Little bunny, tell me true, What captures your attention so? Is it the salty breeze, Or the seagulls' cries that please? Perhaps it's the vastness of it all, A world so big and small. A place where dreams can roam, And adventures can be sown. Little bunny, I wonder what you dream, As you sit on the grass, so serene. Do you yearn to explore the deep, Or stay on land, where you can leap? Whatever it may be, little bunny, Keep that spark of wonder burning brightly. For in your dreams and aspirations, Lies a world waiting for your creation. Troubleshooting which part of the prompt failed It can be hard to know whether a prompt failed because the model didn't understand the image to begin with, or if it did understand the image but did not perform the correct reasoning steps afterward. To disambiguate between those reasons, ask the model to describe what's in the image. In the following example, if the model responds with a snack that seems surprising when paired with tea (e.g. popcorn), you can first troubleshoot to determine whether the model correctly recognized that the image contains tea. Prompt Prompt for troubleshooting What's a snack I can make in 1 minute that would go well with this? Describe what's in this image. Another strategy is to ask the model to explain its reasoning. That can help you narrow down which part of the reasoning broke down, if any. Prompt Prompt for troubleshooting What's a snack I can make in 1 minute that would go well with this? What's a snack I can make in 1 minute that would go well with this? Please explain why. What's next Try writing your own multimodal prompts using Google AI Studio . For information on using the Gemini Files API for uploading media files and including them in your prompts, see the Vision , Audio , and Document processing guides. For more guidance on prompt design, like tuning sampling parameters, see the Prompt strategies page. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_files_97554b81.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/files#main-content
2
  Title: Files API  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Files API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Files API The Gemini family of artificial intelligence (AI) models is built to handle various types of input data, including text, images, and audio. Since these models can handle more than one type or mode of data, the Gemini models are called multimodal models or explained as having multimodal capabilities . This guide shows you how to work with media files using the Files API. The basic operations are the same for audio files, images, videos, documents, and other supported file types. For file prompting guidance, check out the File prompt guide section. Upload a file You can use the Files API to upload a media file. Always use the Files API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20 MB. The following code uploads a file and then uses the file in a call to generateContent . Python from google import genai client = genai . Client () myfile = client . files . upload ( file = "path/to/sample.mp3" ) response = client . models . generate_content ( model = "gemini-2.0-flash" , contents = [ "Describe this audio clip" , myfile ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); async function main () { const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const response = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Describe this audio clip" , ]), }); console . log ( response . text ); } await main (); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } defer client . DeleteFile ( ctx , file . Name ) model := client . GenerativeModel ( "gemini-2.0-flash" ) resp , err := model . GenerateContent ( ctx , genai . FileData { URI : file . URI }, genai . Text ( "Describe this audio clip" )) if err != nil { log . Fatal ( err ) } printResponse ( resp ) REST AUDIO_PATH = "path/to/sample.mp3" MIME_TYPE = $( file -b --mime-type " ${ AUDIO_PATH } " ) NUM_BYTES = $( wc -c < " ${ AUDIO_PATH } " ) DISPLAY_NAME = AUDIO tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl " ${ BASE_URL } /upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D " ${ tmp_header_file } " \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ AUDIO_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "Describe this audio clip"}, {"file_data":{"mime_type": "${MIME_TYPE}", "file_uri": ' $file_uri '}}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Get metadata for a file You can verify that the API successfully stored the uploaded file and get its metadata by calling files.get . Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) file_name = myfile . name myfile = client . files . get ( name = file_name ) print ( myfile ) JavaScript const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const fileName = myfile . name ; const fetchedFile = await ai . files . get ({ name : fileName }); console . log ( fetchedFile ); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } gotFile , err := client . GetFile ( ctx , file . Name ) if err != nil { log . Fatal ( err ) } fmt . Println ( "Got file:" , gotFile . Name ) REST # file_info.json was created in the upload example name = $( jq ".file.name" file_info.json ) # Get the file of interest to check state curl https://generativelanguage.googleapis.com/v1beta/files/ $name \ -H "x-goog-api-key: $GEMINI_API_KEY " > file_info.json # Print some information about the file you got name = $( jq ".file.name" file_info.json ) echo name = $name file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri List uploaded files You can upload multiple files using the Files API. The following code gets a list of all the files uploaded: Python print ( 'My files:' ) for f in client . files . list (): print ( ' ' , f . name ) JavaScript const listResponse = await ai . files . list ({ config : { pageSize : 10 } }); for await ( const file of listResponse ) { console . log ( file . name ); } Go iter := client . ListFiles ( ctx ) for { ifile , err := iter . Next () if err == iterator . Done { break } if err != nil { log . Fatal ( err ) } fmt . Println ( ifile . Name ) } REST echo "My files: " curl "https://generativelanguage.googleapis.com/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " Delete uploaded files Files are automatically deleted after 48 hours. You can also manually delete an uploaded file: Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) client . files . delete ( name = myfile . name ) JavaScript const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const fileName = myfile . name ; await ai . files . delete ({ name : fileName }); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } client . DeleteFile ( ctx , file . Name ) REST curl --request "DELETE" https://generativelanguage.googleapis.com/v1beta/files/ $name \ -H "x-goog-api-key: $GEMINI_API_KEY " Usage info You can use the Files API to upload and interact with media files. The Files API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB. Files are stored for 48 hours. During that time, you can use the API to get metadata about the files, but you can't download the files. The Files API is available at no cost in all regions where the Gemini API is available. File prompting strategies This section provides guidance and best practices for using media files with prompts for the Gemini API. Being able to use various types of data in your prompts gives you more flexibility in terms of what tasks you can tackle with the Gemini API. For example, you can send the model a photo of a delicious meal and ask it to write a short blog about the meal. Prompt Response Write a short, engaging blog post based on this picture. It should include a description of the meal in the photo and talk about my journey meal prepping. Meal prepping is a great way to save time and money, and it can also help you to eat healthier. This meal is a great example of a healthy and delicious meal that can be easily prepped ahead of time. This meal features brown rice, roasted vegetables, and chicken teriyaki. The brown rice is a healthy and complex carbohydrate that will give you sustained energy. The roasted vegetables are a great way to get your daily dose of vitamins and minerals, and the chicken teriyaki is a delicious and protein-rich option. This meal is also very easy to prepare. Simply cook the brown rice, roast the vegetables, and cook the chicken teriyaki. Once everything is cooked, divide it into meal prep containers and store them in the refrigerator. You can then grab a container and go on busy mornings! If you're looking for a healthy and delicious meal that can be easily prepped ahead of time, this meal is a great option. It's packed with nutrients and flavor, and it's sure to keep you feeling full and satisfied. Here's to healthy and delicious meal prepping! If you are having trouble getting the output you want from prompts that use media files, there are some strategies that can help you get the results you want. The following sections provide design approaches and troubleshooting tips for improving prompts that use multimodal input. You can improve your multimodal prompts by following these best practices: Prompt design fundamentals Be specific in your instructions : Craft clear and concise instructions that leave minimal room for misinterpretation. Add a few examples to your prompt: Use realistic few-shot examples to illustrate what you want to achieve. Break it down step-by-step : Divide complex tasks into manageable sub-goals, guiding the model through the process. Specify the output format : In your prompt, ask for the output to be in the format you want, like markdown, JSON, HTML and more. Put your image first for single-image prompts : While Gemini can handle image and text inputs in any order, for prompts containing a single image, it might perform better if that image (or video) is placed before the text prompt. However, for prompts that require images to be highly interleaved with texts to make sense, use whatever order is most natural. Troubleshooting your multimodal prompt If the model is not drawing information from the relevant part of the image: Drop hints with which aspects of the image you want the prompt to draw information from. If the model output is too generic (not tailored enough to the image/video input): At the start of the prompt, try asking the model to describe the image(s) or video before providing the task instruction, or try asking the model to refer to what's in the image. To troubleshoot which part failed: Ask the model to describe the image, or ask the model to explain its reasoning, to gauge the model's initial understanding. If your prompt results in hallucinated content: Try dialing down the temperature setting or asking the model for shorter descriptions so that it's less likely to extrapolate additional details. Tuning the sampling parameters: Experiment with different temperature settings and top-k selections to adjust the model's creativity. Be specific in your instructions Prompts have the most success when they are clear and detailed. If you have a specific output in mind, it's better to include that requirement in the prompt to ensure you get the output you want. For this image of an airport board, asking the model to just "describe this image" could generate a general description. If you need the model to parse the time and city from the image, you can include that request directly in your prompt. Prompt Model response Describe this image. The image shows an airport arrivals and departures board. Updated prompt Improved response Parse the time and city from the airport board shown in this image into a list. 10:50 Moscow 11:05 Edinburgh 11:05 London 11:10 Bucharest 11:30 Kiev 11:35 Dublin 11:45 East Midlands 12:15 Sofia 12:30 London 12:30 Newcastle 12:40 St Petersburg 12:40 London 12:45 Manchester Add a few examples The Gemini model can accept multiple inputs which it can use as examples to understand the output you want. Adding these examples can help the model identify the patterns and apply the relationship between the given images and responses to the new example. This is also called "few-shot" learning. In the example below, the initial output is written in sentence form, and also contains the country (Brazil). Suppose you want the output in a different format or style, and you want only the city, not the country. Adding few-shot examples to your prompt can steer the model to respond in the way you want. Prompt Model response Determine the city along with the landmark. The landmark is the Christ the Redeemer statue in Rio de Janeiro, Brazil. Updated prompt Improved response Determine the city along with the landmark. city: Rome, landmark: the Colosseum. city: Beijing, landmark: Forbidden City city: Rio de Janeiro, landmark: Christ the Redeemer statue Break it down step-by-step For complex tasks like the ones that require both visual understanding and reasoning, it can be helpful to split the task into smaller, more straightforward steps. Alternatively, it could also be effective if you directly ask the model to “think step by step” in your prompt. Prompt Model response When will I run out of toilet paper? Soon, you only have 3 rolls left. Updated prompt Improved response 1. First, count how many toilet paper rolls are in this picture. 2. Then, determine how much toilet paper a typical person uses per day. 3. Calculate how long these rolls of toilet paper will last. 1. There are 3 rolls of toilet paper in this picture. 2. A typical person uses about 20 sheets of toilet paper per day. 3. If each roll contains 200 sheets, then each roll will last for about 10 days. Therefore, the 3 rolls will last for about a month. Math problems or other types of word problems are great candidates for asking the model to think step-by-step. Prompt Response What is the 4th term in the sequence? -135 The response from the model is incorrect. Some ways to improve this is to ask the model to parse the problem before answering the prompt. Prompt Response Parse the formula in the image first. Then based on the formula, answer "what's the 4th term in the sequence?". Click to expand the result $$ b(n) = b(n-1) * (-3) $$ $$ b(1) = 15 $$ Substitute n = 4 into the formula: $$ b(4) = b(4-1) * (-3) $$ $$ b(4) = b(3) * (-3) $$ We don't know the value of b(3), so substitute n = 3 into the formula: $$ b(3) = b(3-1) * (-3) $$ $$ b(3) = b(2) * (-3) $$ We don't know the value of b(2), so substitute n = 2 into the formula: $$ b(2) = b(2-1) * (-3) $$ $$ b(2) = b(1) * (-3) $$ We know the value of b(1) is 15, so substitute it into the formula: $$ b(2) = 15 * (-3) $$ $$ b(2) = -45 $$ Now that we know the value of b(2), substitute it into the formula for b(3): $$ b(3) = b(2) * (-3) $$ $$ b(3) = -45 * (-3) $$ $$ b(3) = 135 $$ Now that we know the value of b(3), substitute it into the formula for b(4): $$ b(4) = b(3) * (-3) $$ $$ b(4) = 135 * (-3) $$ $$ b(4) = -405 $$ Therefore, the 4th term in the sequence is -405. This model response is correct - now to make the response even more streamlined, you can ask the model to to think step-by-step. Updated prompt Improved response What is the 4th term in the sequence? Think step by step. The first term is 15. The second term is b(1) * (-3) = 15 * (-3) = -45. The third term is b(2) * (-3) = -45 * (-3) = 135. The fourth term is b(3) * (-3) = 135 * (-3) = -405. Specify the output format A common problem is the need for model output to be in a certain format (e.g. markdown, JSON, HTML), especially if the model output needs to be ingested by a downstream task. You can try to produce output in that format by instructing the model to do so within the prompt. Below are two examples: Prompt Model response Parse the table in this image into markdown format | Subject | Status | Time | Start date | Due on | | ------- | ------ | ---- | ---------- | ------ | | Math | Done | 1.5 hrs | 11/29/2023 | 12/1/2023 | | Geography | Done | 30 min | 12/2/2023 | 12/4/2023 | | Physics | In progress | 2 hrs | 12/3/2023 | 12/5/2023 | | Biology | Not started | 6 hrs | 11/2/2016 | 12/6/2023 | Prompt Model response Provide a list of all the following attributes: ingredients, type of cuisine, vegetarian or not, in JSON format ```json { "ingredients": [ "rice", "shrimp", "clams", "mussels", "peas", "tomatoes", "onions", "garlic", "olive oil", "paprika", "salt", "pepper" ], "type of cuisine": "Spanish", "vegetarian": false } ``` Put your image first for single-image prompts While Gemini can interpret image and text in any order within a prompt, placing a single image before the text prompt might lead to better results. In the following samples, you'll notice the image comes first before the text in each prompt. Troubleshooting your multimodal prompt You might need to troubleshoot your prompt if you are not getting a helpful response. Here are a few strategies you could try. If the model is not drawing information from the relevant part of the image To get a more specific response, you can point out which aspects of the image you want the prompt to draw information from to inform its response. Prompt Model response How many days will these diapers last a baby? The diapers will last for 198 days before they run out. Updated prompt Improved response How long will these diapers last before I run out? Use the weight shown on the box to determine the child's age, and use the total number of diapers in the box. Divide the total number by how many diapers the child goes through per day. The box contains 198 diapers. The box states the diapers are for a child weighing 22-37 lbs. The average child in this weight range is likely between 1 and 2 years old. A 2-year-old typically goes through 6 diapers per day, therefore, the diapers will last around 33 days (198/6). If the model output is too generic and not tailored enough to the image input To help the model tailor its response to the image(s), try asking it to describe the images before performing its reasoning task. Prompt Model response What is in common between these images? All four images are examples of landscape photography. Updated prompt Improved response First, describe what's in each image in detail. What's in common between these images? The first image shows a person packing a suitcase. The second image shows an empty hallway in a shopping mall. The third image shows a roll of toilet paper. The fourth image shows a dark and stormy sea. The commonality between these images is that they are all examples of liminal spaces. Liminal spaces are places or situations that are in between two other places or situations. They are often characterized by a sense of uncertainty or ambiguity. The first image shows the person in between being at home and being on a trip. The second image shows the hallway in between two stores. The third image is of toilet paper, which can represent a bathroom. A bathroom is a transitional space between private and public realms. It's a place where we prepare ourselves to enter or leave the outside world. The fourth image shows the sea in between being calm and being stormy. Liminal spaces can be both exciting and unsettling. They can be a place of new beginning or a place of endings. They can be a place of reflection or a place of action. You can also try explicitly asking the model to refer to what's in the image. Updated prompt Improved response What is in common between these images? Refer to what's in the images in your response. All of the images are related to travel. The first image shows a person packing a suitcase, the second image shows an empty hallway in a shopping mall, the third image shows a stack of toilet paper, and the fourth image shows a dark and stormy sea. These images are all related to travel because they are all things that people need or use when they travel. The first image shows a person packing a suitcase, which is something that people do when they are going on a trip. The second image shows an empty hallway in a shopping mall, which is a place where people often go to buy things for their trips. The third image shows a stack of toilet paper, which is something that people often use when they are traveling. The fourth image shows a dark and stormy sea, which is something that people often encounter when they are traveling. The same tips work for creative output from images. In the following example with the rabbit, explicitly asking the model to include aspects from the image can improve the output. Prompt Model response Can you write me a poem about this image? Little bunny, so soft and brown, Hopping through the field, up and down. Your ears so long, your tail so small, You're the cutest creature of them all. You nibble on the grass, so green and sweet, And hop away when you see me. But I'll always be your friend, Little bunny, until the very end. Updated prompt Improved response Can you write me a descriptive and dramatic poem about this image and include the location? Little bunny on the green, Ears so long, so keen. What do you see out there, In the distance, where you stare? Is it the ocean blue, Sparkling in the sun's hue? Or the waves that crash and roar, A rhythm that you adore? Little bunny, tell me true, What captures your attention so? Is it the salty breeze, Or the seagulls' cries that please? Perhaps it's the vastness of it all, A world so big and small. A place where dreams can roam, And adventures can be sown. Little bunny, I wonder what you dream, As you sit on the grass, so serene. Do you yearn to explore the deep, Or stay on land, where you can leap? Whatever it may be, little bunny, Keep that spark of wonder burning brightly. For in your dreams and aspirations, Lies a world waiting for your creation. Troubleshooting which part of the prompt failed It can be hard to know whether a prompt failed because the model didn't understand the image to begin with, or if it did understand the image but did not perform the correct reasoning steps afterward. To disambiguate between those reasons, ask the model to describe what's in the image. In the following example, if the model responds with a snack that seems surprising when paired with tea (e.g. popcorn), you can first troubleshoot to determine whether the model correctly recognized that the image contains tea. Prompt Prompt for troubleshooting What's a snack I can make in 1 minute that would go well with this? Describe what's in this image. Another strategy is to ask the model to explain its reasoning. That can help you narrow down which part of the reasoning broke down, if any. Prompt Prompt for troubleshooting What's a snack I can make in 1 minute that would go well with this? What's a snack I can make in 1 minute that would go well with this? Please explain why. What's next Try writing your own multimodal prompts using Google AI Studio . For information on using the Gemini Files API for uploading media files and including them in your prompts, see the Vision , Audio , and Document processing guides. For more guidance on prompt design, like tuning sampling parameters, see the Prompt strategies page. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
 
2
  Title: Files API  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Files API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Files API The Gemini family of artificial intelligence (AI) models is built to handle various types of input data, including text, images, and audio. Since these models can handle more than one type or mode of data, the Gemini models are called multimodal models or explained as having multimodal capabilities . This guide shows you how to work with media files using the Files API. The basic operations are the same for audio files, images, videos, documents, and other supported file types. For file prompting guidance, check out the File prompt guide section. Upload a file You can use the Files API to upload a media file. Always use the Files API when the total request size (including the files, text prompt, system instructions, etc.) is larger than 20 MB. The following code uploads a file and then uses the file in a call to generateContent . Python from google import genai client = genai . Client () myfile = client . files . upload ( file = "path/to/sample.mp3" ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ "Describe this audio clip" , myfile ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); async function main () { const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Describe this audio clip" , ]), }); console . log ( response . text ); } await main (); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } defer client . DeleteFile ( ctx , file . Name ) model := client . GenerativeModel ( "gemini-2.5-flash" ) resp , err := model . GenerateContent ( ctx , genai . FileData { URI : file . URI }, genai . Text ( "Describe this audio clip" )) if err != nil { log . Fatal ( err ) } printResponse ( resp ) REST AUDIO_PATH = "path/to/sample.mp3" MIME_TYPE = $( file -b --mime-type " ${ AUDIO_PATH } " ) NUM_BYTES = $( wc -c < " ${ AUDIO_PATH } " ) DISPLAY_NAME = AUDIO tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl " ${ BASE_URL } /upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D " ${ tmp_header_file } " \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ AUDIO_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "Describe this audio clip"}, {"file_data":{"mime_type": "${MIME_TYPE}", "file_uri": ' $file_uri '}}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Get metadata for a file You can verify that the API successfully stored the uploaded file and get its metadata by calling files.get . Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) file_name = myfile . name myfile = client . files . get ( name = file_name ) print ( myfile ) JavaScript const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const fileName = myfile . name ; const fetchedFile = await ai . files . get ({ name : fileName }); console . log ( fetchedFile ); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } gotFile , err := client . GetFile ( ctx , file . Name ) if err != nil { log . Fatal ( err ) } fmt . Println ( "Got file:" , gotFile . Name ) REST # file_info.json was created in the upload example name = $( jq ".file.name" file_info.json ) # Get the file of interest to check state curl https://generativelanguage.googleapis.com/v1beta/files/ $name \ -H "x-goog-api-key: $GEMINI_API_KEY " > file_info.json # Print some information about the file you got name = $( jq ".file.name" file_info.json ) echo name = $name file_uri = $( jq ".file.uri" file_info.json ) echo file_uri = $file_uri List uploaded files You can upload multiple files using the Files API. The following code gets a list of all the files uploaded: Python print ( 'My files:' ) for f in client . files . list (): print ( ' ' , f . name ) JavaScript const listResponse = await ai . files . list ({ config : { pageSize : 10 } }); for await ( const file of listResponse ) { console . log ( file . name ); } Go iter := client . ListFiles ( ctx ) for { ifile , err := iter . Next () if err == iterator . Done { break } if err != nil { log . Fatal ( err ) } fmt . Println ( ifile . Name ) } REST echo "My files: " curl "https://generativelanguage.googleapis.com/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " Delete uploaded files Files are automatically deleted after 48 hours. You can also manually delete an uploaded file: Python myfile = client . files . upload ( file = 'path/to/sample.mp3' ) client . files . delete ( name = myfile . name ) JavaScript const myfile = await ai . files . upload ({ file : "path/to/sample.mp3" , config : { mimeType : "audio/mpeg" }, }); const fileName = myfile . name ; await ai . files . delete ({ name : fileName }); Go file , err := client . UploadFileFromPath ( ctx , "path/to/sample.mp3" , nil ) if err != nil { log . Fatal ( err ) } client . DeleteFile ( ctx , file . Name ) REST curl --request "DELETE" https://generativelanguage.googleapis.com/v1beta/files/ $name \ -H "x-goog-api-key: $GEMINI_API_KEY " Usage info You can use the Files API to upload and interact with media files. The Files API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB. Files are stored for 48 hours. During that time, you can use the API to get metadata about the files, but you can't download the files. The Files API is available at no cost in all regions where the Gemini API is available. File prompting strategies This section provides guidance and best practices for using media files with prompts for the Gemini API. Being able to use various types of data in your prompts gives you more flexibility in terms of what tasks you can tackle with the Gemini API. For example, you can send the model a photo of a delicious meal and ask it to write a short blog about the meal. Prompt Response Write a short, engaging blog post based on this picture. It should include a description of the meal in the photo and talk about my journey meal prepping. Meal prepping is a great way to save time and money, and it can also help you to eat healthier. This meal is a great example of a healthy and delicious meal that can be easily prepped ahead of time. This meal features brown rice, roasted vegetables, and chicken teriyaki. The brown rice is a healthy and complex carbohydrate that will give you sustained energy. The roasted vegetables are a great way to get your daily dose of vitamins and minerals, and the chicken teriyaki is a delicious and protein-rich option. This meal is also very easy to prepare. Simply cook the brown rice, roast the vegetables, and cook the chicken teriyaki. Once everything is cooked, divide it into meal prep containers and store them in the refrigerator. You can then grab a container and go on busy mornings! If you're looking for a healthy and delicious meal that can be easily prepped ahead of time, this meal is a great option. It's packed with nutrients and flavor, and it's sure to keep you feeling full and satisfied. Here's to healthy and delicious meal prepping! If you are having trouble getting the output you want from prompts that use media files, there are some strategies that can help you get the results you want. The following sections provide design approaches and troubleshooting tips for improving prompts that use multimodal input. You can improve your multimodal prompts by following these best practices: Prompt design fundamentals Be specific in your instructions : Craft clear and concise instructions that leave minimal room for misinterpretation. Add a few examples to your prompt: Use realistic few-shot examples to illustrate what you want to achieve. Break it down step-by-step : Divide complex tasks into manageable sub-goals, guiding the model through the process. Specify the output format : In your prompt, ask for the output to be in the format you want, like markdown, JSON, HTML and more. Put your image first for single-image prompts : While Gemini can handle image and text inputs in any order, for prompts containing a single image, it might perform better if that image (or video) is placed before the text prompt. However, for prompts that require images to be highly interleaved with texts to make sense, use whatever order is most natural. Troubleshooting your multimodal prompt If the model is not drawing information from the relevant part of the image: Drop hints with which aspects of the image you want the prompt to draw information from. If the model output is too generic (not tailored enough to the image/video input): At the start of the prompt, try asking the model to describe the image(s) or video before providing the task instruction, or try asking the model to refer to what's in the image. To troubleshoot which part failed: Ask the model to describe the image, or ask the model to explain its reasoning, to gauge the model's initial understanding. If your prompt results in hallucinated content: Try dialing down the temperature setting or asking the model for shorter descriptions so that it's less likely to extrapolate additional details. Tuning the sampling parameters: Experiment with different temperature settings and top-k selections to adjust the model's creativity. Be specific in your instructions Prompts have the most success when they are clear and detailed. If you have a specific output in mind, it's better to include that requirement in the prompt to ensure you get the output you want. For this image of an airport board, asking the model to just "describe this image" could generate a general description. If you need the model to parse the time and city from the image, you can include that request directly in your prompt. Prompt Model response Describe this image. The image shows an airport arrivals and departures board. Updated prompt Improved response Parse the time and city from the airport board shown in this image into a list. 10:50 Moscow 11:05 Edinburgh 11:05 London 11:10 Bucharest 11:30 Kiev 11:35 Dublin 11:45 East Midlands 12:15 Sofia 12:30 London 12:30 Newcastle 12:40 St Petersburg 12:40 London 12:45 Manchester Add a few examples The Gemini model can accept multiple inputs which it can use as examples to understand the output you want. Adding these examples can help the model identify the patterns and apply the relationship between the given images and responses to the new example. This is also called "few-shot" learning. In the example below, the initial output is written in sentence form, and also contains the country (Brazil). Suppose you want the output in a different format or style, and you want only the city, not the country. Adding few-shot examples to your prompt can steer the model to respond in the way you want. Prompt Model response Determine the city along with the landmark. The landmark is the Christ the Redeemer statue in Rio de Janeiro, Brazil. Updated prompt Improved response Determine the city along with the landmark. city: Rome, landmark: the Colosseum. city: Beijing, landmark: Forbidden City city: Rio de Janeiro, landmark: Christ the Redeemer statue Break it down step-by-step For complex tasks like the ones that require both visual understanding and reasoning, it can be helpful to split the task into smaller, more straightforward steps. Alternatively, it could also be effective if you directly ask the model to “think step by step” in your prompt. Prompt Model response When will I run out of toilet paper? Soon, you only have 3 rolls left. Updated prompt Improved response 1. First, count how many toilet paper rolls are in this picture. 2. Then, determine how much toilet paper a typical person uses per day. 3. Calculate how long these rolls of toilet paper will last. 1. There are 3 rolls of toilet paper in this picture. 2. A typical person uses about 20 sheets of toilet paper per day. 3. If each roll contains 200 sheets, then each roll will last for about 10 days. Therefore, the 3 rolls will last for about a month. Math problems or other types of word problems are great candidates for asking the model to think step-by-step. Prompt Response What is the 4th term in the sequence? -135 The response from the model is incorrect. Some ways to improve this is to ask the model to parse the problem before answering the prompt. Prompt Response Parse the formula in the image first. Then based on the formula, answer "what's the 4th term in the sequence?". Click to expand the result $$ b(n) = b(n-1) * (-3) $$ $$ b(1) = 15 $$ Substitute n = 4 into the formula: $$ b(4) = b(4-1) * (-3) $$ $$ b(4) = b(3) * (-3) $$ We don't know the value of b(3), so substitute n = 3 into the formula: $$ b(3) = b(3-1) * (-3) $$ $$ b(3) = b(2) * (-3) $$ We don't know the value of b(2), so substitute n = 2 into the formula: $$ b(2) = b(2-1) * (-3) $$ $$ b(2) = b(1) * (-3) $$ We know the value of b(1) is 15, so substitute it into the formula: $$ b(2) = 15 * (-3) $$ $$ b(2) = -45 $$ Now that we know the value of b(2), substitute it into the formula for b(3): $$ b(3) = b(2) * (-3) $$ $$ b(3) = -45 * (-3) $$ $$ b(3) = 135 $$ Now that we know the value of b(3), substitute it into the formula for b(4): $$ b(4) = b(3) * (-3) $$ $$ b(4) = 135 * (-3) $$ $$ b(4) = -405 $$ Therefore, the 4th term in the sequence is -405. This model response is correct - now to make the response even more streamlined, you can ask the model to to think step-by-step. Updated prompt Improved response What is the 4th term in the sequence? Think step by step. The first term is 15. The second term is b(1) * (-3) = 15 * (-3) = -45. The third term is b(2) * (-3) = -45 * (-3) = 135. The fourth term is b(3) * (-3) = 135 * (-3) = -405. Specify the output format A common problem is the need for model output to be in a certain format (e.g. markdown, JSON, HTML), especially if the model output needs to be ingested by a downstream task. You can try to produce output in that format by instructing the model to do so within the prompt. Below are two examples: Prompt Model response Parse the table in this image into markdown format | Subject | Status | Time | Start date | Due on | | ------- | ------ | ---- | ---------- | ------ | | Math | Done | 1.5 hrs | 11/29/2023 | 12/1/2023 | | Geography | Done | 30 min | 12/2/2023 | 12/4/2023 | | Physics | In progress | 2 hrs | 12/3/2023 | 12/5/2023 | | Biology | Not started | 6 hrs | 11/2/2016 | 12/6/2023 | Prompt Model response Provide a list of all the following attributes: ingredients, type of cuisine, vegetarian or not, in JSON format ```json { "ingredients": [ "rice", "shrimp", "clams", "mussels", "peas", "tomatoes", "onions", "garlic", "olive oil", "paprika", "salt", "pepper" ], "type of cuisine": "Spanish", "vegetarian": false } ``` Put your image first for single-image prompts While Gemini can interpret image and text in any order within a prompt, placing a single image before the text prompt might lead to better results. In the following samples, you'll notice the image comes first before the text in each prompt. Troubleshooting your multimodal prompt You might need to troubleshoot your prompt if you are not getting a helpful response. Here are a few strategies you could try. If the model is not drawing information from the relevant part of the image To get a more specific response, you can point out which aspects of the image you want the prompt to draw information from to inform its response. Prompt Model response How many days will these diapers last a baby? The diapers will last for 198 days before they run out. Updated prompt Improved response How long will these diapers last before I run out? Use the weight shown on the box to determine the child's age, and use the total number of diapers in the box. Divide the total number by how many diapers the child goes through per day. The box contains 198 diapers. The box states the diapers are for a child weighing 22-37 lbs. The average child in this weight range is likely between 1 and 2 years old. A 2-year-old typically goes through 6 diapers per day, therefore, the diapers will last around 33 days (198/6). If the model output is too generic and not tailored enough to the image input To help the model tailor its response to the image(s), try asking it to describe the images before performing its reasoning task. Prompt Model response What is in common between these images? All four images are examples of landscape photography. Updated prompt Improved response First, describe what's in each image in detail. What's in common between these images? The first image shows a person packing a suitcase. The second image shows an empty hallway in a shopping mall. The third image shows a roll of toilet paper. The fourth image shows a dark and stormy sea. The commonality between these images is that they are all examples of liminal spaces. Liminal spaces are places or situations that are in between two other places or situations. They are often characterized by a sense of uncertainty or ambiguity. The first image shows the person in between being at home and being on a trip. The second image shows the hallway in between two stores. The third image is of toilet paper, which can represent a bathroom. A bathroom is a transitional space between private and public realms. It's a place where we prepare ourselves to enter or leave the outside world. The fourth image shows the sea in between being calm and being stormy. Liminal spaces can be both exciting and unsettling. They can be a place of new beginning or a place of endings. They can be a place of reflection or a place of action. You can also try explicitly asking the model to refer to what's in the image. Updated prompt Improved response What is in common between these images? Refer to what's in the images in your response. All of the images are related to travel. The first image shows a person packing a suitcase, the second image shows an empty hallway in a shopping mall, the third image shows a stack of toilet paper, and the fourth image shows a dark and stormy sea. These images are all related to travel because they are all things that people need or use when they travel. The first image shows a person packing a suitcase, which is something that people do when they are going on a trip. The second image shows an empty hallway in a shopping mall, which is a place where people often go to buy things for their trips. The third image shows a stack of toilet paper, which is something that people often use when they are traveling. The fourth image shows a dark and stormy sea, which is something that people often encounter when they are traveling. The same tips work for creative output from images. In the following example with the rabbit, explicitly asking the model to include aspects from the image can improve the output. Prompt Model response Can you write me a poem about this image? Little bunny, so soft and brown, Hopping through the field, up and down. Your ears so long, your tail so small, You're the cutest creature of them all. You nibble on the grass, so green and sweet, And hop away when you see me. But I'll always be your friend, Little bunny, until the very end. Updated prompt Improved response Can you write me a descriptive and dramatic poem about this image and include the location? Little bunny on the green, Ears so long, so keen. What do you see out there, In the distance, where you stare? Is it the ocean blue, Sparkling in the sun's hue? Or the waves that crash and roar, A rhythm that you adore? Little bunny, tell me true, What captures your attention so? Is it the salty breeze, Or the seagulls' cries that please? Perhaps it's the vastness of it all, A world so big and small. A place where dreams can roam, And adventures can be sown. Little bunny, I wonder what you dream, As you sit on the grass, so serene. Do you yearn to explore the deep, Or stay on land, where you can leap? Whatever it may be, little bunny, Keep that spark of wonder burning brightly. For in your dreams and aspirations, Lies a world waiting for your creation. Troubleshooting which part of the prompt failed It can be hard to know whether a prompt failed because the model didn't understand the image to begin with, or if it did understand the image but did not perform the correct reasoning steps afterward. To disambiguate between those reasons, ask the model to describe what's in the image. In the following example, if the model responds with a snack that seems surprising when paired with tea (e.g. popcorn), you can first troubleshoot to determine whether the model correctly recognized that the image contains tea. Prompt Prompt for troubleshooting What's a snack I can make in 1 minute that would go well with this? Describe what's in this image. Another strategy is to ask the model to explain its reasoning. That can help you narrow down which part of the reasoning broke down, if any. Prompt Prompt for troubleshooting What's a snack I can make in 1 minute that would go well with this? What's a snack I can make in 1 minute that would go well with this? Please explain why. What's next Try writing your own multimodal prompts using Google AI Studio . For information on using the Gemini Files API for uploading media files and including them in your prompts, see the Vision , Audio , and Document processing guides. For more guidance on prompt design, like tuning sampling parameters, see the Prompt strategies page. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_function-calling_1c69a1f6.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/function-calling?example=weather#step
2
  Title: Function calling with the Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Function calling with the Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Function calling with the Gemini API Function calling lets you connect models to external tools and APIs. Instead of generating text responses, the model determines when to call specific functions and provides the necessary parameters to execute real-world actions. This allows the model to act as a bridge between natural language and real-world actions and data. Function calling has 3 primary use cases: Augment Knowledge: Access information from external sources like databases, APIs, and knowledge bases. Extend Capabilities: Use external tools to perform computations and extend the limitations of the model, such as using a calculator or creating charts. Take Actions: Interact with external systems using APIs, such as scheduling appointments, creating invoices, sending emails, or controlling smart home devices. Get Weather Schedule Meeting Create Chart Python from google import genai from google.genai import types # Define the function declaration for the model weather_function = { "name" : "get_current_temperature" , "description" : "Gets the current temperature for a given location." , "parameters" : { "type" : "object" , "properties" : { "location" : { "type" : "string" , "description" : "The city name, e.g. San Francisco" , }, }, "required" : [ "location" ], }, } # Configure the client and tools client = genai . Client () tools = types . Tool ( function_declarations = [ weather_function ]) config = types . GenerateContentConfig ( tools = [ tools ]) # Send request with function declarations response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "What's the temperature in London?" , config = config , ) # Check for a function call if response . candidates [ 0 ] . content . parts [ 0 ] . function_call : function_call = response . candidates [ 0 ] . content . parts [ 0 ] . function_call print ( f "Function to call: { function_call . name } " ) print ( f "Arguments: { function_call . args } " ) # In a real app, you would call your function here: # result = get_current_temperature(**function_call.args) else : print ( "No function call found in the response." ) print ( response . text ) JavaScript import { GoogleGenAI , Type } from '@google/genai' ; // Configure the client const ai = new GoogleGenAI ({}); // Define the function declaration for the model const weatherFunctionDeclaration = { name : 'get_current_temperature' , description : 'Gets the current temperature for a given location.' , parameters : { type : Type . OBJECT , properties : { location : { type : Type . STRING , description : 'The city name, e.g. San Francisco' , }, }, required : [ 'location' ], }, }; // Send request with function declarations const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : "What's the temperature in London?" , config : { tools : [{ functionDeclarations : [ weatherFunctionDeclaration ] }], }, }); // Check for function calls in the response if ( response . functionCalls && response . functionCalls . length > 0 ) { const functionCall = response . functionCalls [ 0 ]; // Assuming one function call console . log ( `Function to call: ${ functionCall . name } ` ); console . log ( `Arguments: ${ JSON . stringify ( functionCall . args ) } ` ); // In a real app, you would call your actual function here: // const result = await getCurrentTemperature(functionCall.args); } else { console . log ( "No function call found in the response." ); console . log ( response . text ); } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [ { "role": "user", "parts": [ { "text": "What' \' 's the temperature in London?" } ] } ], "tools": [ { "functionDeclarations": [ { "name": "get_current_temperature", "description": "Gets the current temperature for a given location.", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city name, e.g. San Francisco" } }, "required": ["location"] } } ] } ] }' How function calling works Function calling involves a structured interaction between your application, the model, and external functions. Here's a breakdown of the process: Define Function Declaration: Define the function declaration in your application code. Function Declarations describe the function's name, parameters, and purpose to the model. Call LLM with function declarations: Send user prompt along with the function declaration(s) to the model. It analyzes the request and determines if a function call would be helpful. If so, it responds with a structured JSON object. Execute Function Code (Your Responsibility): The Model does not execute the function itself. It's your application's responsibility to process the response and check for Function Call, if Yes : Extract the name and args of the function and execute the corresponding function in your application. No: The model has provided a direct text response to the prompt (this flow is less emphasized in the example but is a possible outcome). Create User friendly response: If a function was executed, capture the result and send it back to the model in a subsequent turn of the conversation. It will use the result to generate a final, user-friendly response that incorporates the information from the function call. This process can be repeated over multiple turns, allowing for complex interactions and workflows. The model also supports calling multiple functions in a single turn ( parallel function calling ) and in sequence ( compositional function calling ). Step 1: Define a function declaration Define a function and its declaration within your application code that allows users to set light values and make an API request. This function could call external services or APIs. Python # Define a function that the model can call to control smart lights set_light_values_declaration = { "name" : "set_light_values" , "description" : "Sets the brightness and color temperature of a light." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "integer" , "description" : "Light level from 0 to 100. Zero is off and 100 is full brightness" , }, "color_temp" : { "type" : "string" , "enum" : [ "daylight" , "cool" , "warm" ], "description" : "Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`." , }, }, "required" : [ "brightness" , "color_temp" ], }, } # This is the actual function that would be called based on the model's suggestion def set_light_values ( brightness : int , color_temp : str ) - > dict [ str , int | str ]: """Set the brightness and color temperature of a room light. (mock API). Args: brightness: Light level from 0 to 100. Zero is off and 100 is full brightness color_temp: Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. Returns: A dictionary containing the set brightness and color temperature. """ return { "brightness" : brightness , "colorTemperature" : color_temp } JavaScript import { Type } from '@google/genai' ; // Define a function that the model can call to control smart lights const setLightValuesFunctionDeclaration = { name : 'set_light_values' , description : 'Sets the brightness and color temperature of a light.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'Light level from 0 to 100. Zero is off and 100 is full brightness' , }, color_temp : { type : Type . STRING , enum : [ 'daylight' , 'cool' , 'warm' ], description : 'Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`.' , }, }, required : [ 'brightness' , 'color_temp' ], }, }; /** * Set the brightness and color temperature of a room light. (mock API) * @param {number} brightness - Light level from 0 to 100. Zero is off and 100 is full brightness * @param {string} color_temp - Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. * @return {Object} A dictionary containing the set brightness and color temperature. */ function setLightValues ( brightness , color_temp ) { return { brightness : brightness , colorTemperature : color_temp }; } Step 2: Call the model with function declarations Once you have defined your function declarations, you can prompt the model to use them. It analyzes the prompt and function declarations and decides whether to respond directly or to call a function. If a function is called, the response object will contain a function call suggestion. Python from google.genai import types # Configure the client and tools client = genai . Client () tools = types . Tool ( function_declarations = [ set_light_values_declaration ]) config = types . GenerateContentConfig ( tools = [ tools ]) # Define user prompt contents = [ types . Content ( role = "user" , parts = [ types . Part ( text = "Turn the lights down to a romantic level" )] ) ] # Send request with function declarations response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = contents config = config , ) print ( response . candidates [ 0 ] . content . parts [ 0 ] . function_call ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Generation config with function declaration const config = { tools : [{ functionDeclarations : [ setLightValuesFunctionDeclaration ] }] }; // Configure the client const ai = new GoogleGenAI ({}); // Define user prompt const contents = [ { role : 'user' , parts : [{ text : 'Turn the lights down to a romantic level' }] } ]; // Send request with function declarations const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( response . functionCalls [ 0 ]); The model then returns a functionCall object in an OpenAPI compatible schema specifying how to call one or more of the declared functions in order to respond to the user's question. Python id = None args = { 'color_temp' : 'warm' , 'brightness' : 25 } name = 'set_light_values' JavaScript { name : 'set_light_values' , args : { brightness : 25 , color_temp : 'warm' } } Step 3: Execute set_light_values function code Extract the function call details from the model's response, parse the arguments , and execute the set_light_values function. Python # Extract tool call details, it may not be in the first part. tool_call = response . candidates [ 0 ] . content . parts [ 0 ] . function_call if tool_call . name == "set_light_values" : result = set_light_values ( ** tool_call . args ) print ( f "Function execution result: { result } " ) JavaScript // Extract tool call details const tool_call = response . functionCalls [ 0 ] let result ; if ( tool_call . name === 'set_light_values' ) { result = setLightValues ( tool_call . args . brightness , tool_call . args . color_temp ); console . log ( `Function execution result: ${ JSON . stringify ( result ) } ` ); } Step 4: Create user friendly response with function result and call the model again Finally, send the result of the function execution back to the model so it can incorporate this information into its final response to the user. Python # Create a function response part function_response_part = types . Part . from_function_response ( name = tool_call . name , response = { "result" : result }, ) # Append function call and result of the function execution to contents contents . append ( response . candidates [ 0 ] . content ) # Append the content from the model's response. contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Create a function response part const function_response_part = { name : tool_call . name , response : { result } } // Append function call and result of the function execution to contents contents . push ( response . candidates [ 0 ]. content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); // Get the final response from the model const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); This completes the function calling flow. The model successfully used the set_light_values function to perform the request action of the user. Function declarations When you implement function calling in a prompt, you create a tools object, which contains one or more function declarations . You define functions using JSON, specifically with a select subset of the OpenAPI schema format. A single function declaration can include the following parameters: name (string): A unique name for the function ( get_weather_forecast , send_email ). Use descriptive names without spaces or special characters (use underscores or camelCase). description (string): A clear and detailed explanation of the function's purpose and capabilities. This is crucial for the model to understand when to use the function. Be specific and provide examples if helpful ("Finds theaters based on location and optionally movie title which is currently playing in theaters."). parameters (object): Defines the input parameters the function expects. type (string): Specifies the overall data type, such as object . properties (object): Lists individual parameters, each with: type (string): The data type of the parameter, such as string , integer , boolean, array . description (string): A description of the parameter's purpose and format. Provide examples and constraints ("The city and state, e.g., 'San Francisco, CA' or a zip code e.g., '95616'."). enum (array, optional): If the parameter values are from a fixed set, use "enum" to list the allowed values instead of just describing them in the description. This improves accuracy ("enum": ["daylight", "cool", "warm"]). required (array): An array of strings listing the parameter names that are mandatory for the function to operate. Function calling with thinking Enabling "thinking" can improve function call performance by allowing the model to reason through a request before suggesting function calls. However, because the Gemini API is stateless, this reasoning context is lost between turns, which can reduce the quality of function calls as they require multiple turn requests. To preserve this context you can use thought signatures. A thought signature is an encrypted representation of the model's internal thought process that you pass back to the model on subsequent turns. To use thought signatures: Receive the signature: When thinking is enabled, the API response will include a thought_signature field containing an encrypted representation of the model's reasoning. Return the signature: When you send the function's execution result back to the server, include the thought_signature you received. This allows the model to restore its previous thinking context and will likely result in better function calling performance. Receiving signatures from the server Signatures are returned in the part after the model's thinking phase, which typically is a text or function call. Here are some examples of what thought signatures look like returned in each type of part, in response to the request "What's the weather in Lake Tahoe?" using the Get Weather example: Text part [{ "candidates" : [ { "content" : { "parts" : [ { "text" : "Here's what the weather in Lake Tahoe is today" , "thoughtSignature" : "ClcBVKhc7ru7KzUI7SrdUoIdAYLm/+i93aHjfIt4xHyAoO/G70tApxnK2ujBhOhC1PrRy1pkQa88fqFvpHNVd1HDjNLO7mkp6/hFwE+SPPEB3fh0hs4oM8MKhgIBVKhc7uIGvrS7i/T4HpfbnYrluFfWNjZ62gewqe4cVdR/Dlh+zbjtYmDD0gPZ+SuBO7vvHQdzsjePRP+2Y5XddX6LEf/cGGgakq8EhVvw/a6IVzUO6XmpHg2Ag1sl8E9+VFH/lC0R0ZuYdFWligtDuYwp5p5q3o59G0TtWeU2MC1y2MJfE9u/KWd313ldka80/X2W/xF2O/4djMp5G2WKcULfve75zeRCy0mc5iS3SB9mTH0cT6x0vtKjeBx50gcg+CQWtJcRuwTVzz54dmvmK9xvnqA8gKGw3DuaM9wfy5hyY7Qg0z3iyyWdP8T/lbjKim8IEQOk7O1vVwP1Ko7oMYH8JgA1CsoBAVSoXO6v4c5RSyd1cn6EIU0pEFQsjW7rYWPuZdOFq/tsGJT9BCfW7KGkPGwlNSq8jTJFvbcJ/DjtndISQYXwiXd2kGa5JfdS2Kh4zOxCxiWtOk+2nCc3+XQk2nonhO+esGJpkDdbbHZSqRgcUtYKq7q28iPFOQvOFyCiZNB7K86Z/6Hnagu2snSlN/BcTMaFGaWpcCClSUo4foRZn3WbNCoM8rcpD7qEJMp4a5baaSxyyeL1ZTGd2HLpFys/oiW6e3oAnhxuIysCwg==" } ] , "role" : "model" } , "index" : 0 } ] , # Remainder of response... Function call part [{ "candidates" : [ { "content" : { "parts" : [ { "functionCall" : { "name" : "getWeather" , "args" : { "city" : "Lake Tahoe" } } , "thoughtSignature" : "CiwBVKhc7nRyTi3HmggPD9iQiRc261f5jwuMdw3H/itDH0emsb9ZVo3Nwx9p6wpsAVSoXO5i8fDV4jBSBLoaWxB5zUdlGY6aIGp+I0oEnwRRSRQ1LOvrDlojEH8JE8HjiKXALdJrvNPiG+HY3GZEO8pZjEZtc3UoBUh7+SVyjK7Xolu7aRYYeUyzrCapoETWypER1jbrJXnFV23hCosBAVSoXO6oIPNJSmbuEDfGafOhuCSHkpr1yjTp35RXYqmCESzRzWf5+nFXLqncqeFo4ohoxbiYQVpVQbOZF81p8o9zg6xeRE7qMeOv+XN7enXGJ4/s3qNFQpfkSMqRdBITN1VpX7jyfEAjvxBNc7PDfDJZmEPY338ZIY5nFFcmzJSWjVrboFt2sMFv+A==" } ] , "role" : "model" } , "finishReason" : "STOP" , "index" : 0 } ] , # Remainder of response... You can confirm that you received a signature and see what a signature looks like using the following code: # Step 2: Call the model with function declarations # ...Generation config, Configure the client, and Define user prompt (No changes) # Send request with declarations (using a thinking model) response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents ) # See thought signatures for part in response . candidates [ 0 ] . content . parts : if part . thought_signature : print ( "Thought signature:" ) print ( part . thought_signature ) Returning signatures back to the server In order to return signatures back: You should return signatures along with their containing parts back to the server You shouldn't merge a part with a signature with another part which also contains a signature. The signature string is not concatenable You shouldn't merge one part with a signature with another part without a signature. This breaks the correct positioning of the thought represented by the signature. The code will remain the same as in Step 4 of the previous section. But in this case (as indicated in the comment below) you will return signatures to the model along with the result of the function execution so the model can incorporate the thoughts into its final response: Python # Step 4: Create user friendly response with function result and call the model again # ...Create a function response part (No change) # Append thought signatures, function call and result of the function execution to contents function_call_content = response . candidates [ 0 ] . content # Append the model's function call message, which includes thought signatures contents . append ( function_call_content ) contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Step 4: Create user friendly response with function result and call the model again // ...Create a function response part (No change) // Append thought signatures, function call and result of the function execution to contents const function_response_content = response . candidates [ 0 ]. content ; contents . push ( function_response_content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); The following shows what a request returning a thought signature may look like: [{ "contents" : [ { "role" : "user" , "parts" : [ { "text" : "what is the weather in Lake Tahoe?" } ] } , { "parts" : [ { "functionCall" : { "name" : "getWeather" , "args" : { "city" : "Lake Tahoe" } } , "thoughtSignature" : "CiIBVKhc7oDPpCaXyJKKssjqr4g3JNOSgJ/M2V+1THC1icsWCmwBVKhc7pBABbZ+zR3e9234WnWWS6GFXmf8IVwpnzjd5KYd7vyJbn/4vTorWBGayj/vbd9JPaZQjxdAIXhoE5mX/MDsQ7M9N/b0qJjHm39tYIBvS4sIWkMDHqTJqXGLzhhKtrTkfbV3RbaJEkQKmwEBVKhc7qVUgC3hfTXZLo9R3AJzUUIx50NKvJTb9B+UU+LBqgg7Nck1x5OpjWVS2R+SsveprIuYOruk2Y0H53J2OJF8qsxTdIq2si8DGW2V7WK8xyoJH5kbqd7drIw1jLb44b6lx4SMyB0VaULuTBki4d+Ljjg1tJTwR0IYMKqDLDZt9mheINsi0ZxcNjfpnDydRXdWbcSwzmK/wgqJAQFUqFzuKgNVElxs3cbO+xebr2IwcOro84nKTisi0tTp9bICPC9fTUhn3L+rvQWA+d3J1Za8at2bakrqiRj7BTh+CVO9fWQMAEQAs3ni0Z2hfaYG92tOD26E4IoZwyYEoWbfNudpH1fr5tEkyqnEGtWIh7H+XoZQ2DXeiOa+br7Zk88SrNE+trJMCogBAVSoXO5e9fBLg7hnbkmKsrzNLnQtLsQm1gNzjcjEC7nJYklYPp0KI2uGBE1PkM8XNsfllAfHVn7LzHcHNlbQ9pJ7QZTSIeG42goS971r5wNZwxaXwCTphClQh826eqJWo6A/28TtAVQWLhTx5ekbP7qb4nh1UblESZ1saxDQAEo4OKPbDzx5BgqKAQFUqFzuVyjNm5i0wN8hTDnKjfpDroEpPPTs531iFy9BOX+xDCdGHy8D+osFpaoBq6TFekQQbz4hIoUR1YEcP4zI80/cNimEeb9IcFxZTTxiNrbhbbcv0969DSMWhB+ZEqIz4vuw4GLe/xcUvqhlChQwFdgIbdOQHSHpatn5uDlktnP/bi26nKuXIwo0AVSoXO7US22OUH7d1f4abNPI0IyAvhqkPp12rbtWLx9vkOtojE8IP+xCfYtIFuZIzRNZqA==" } ] , "role" : "model" } , { "role" : "user" , "parts" : [ { "functionResponse" : { "name" : "getWeather" , "response" : { "response" : { "stringValue" : "Sunny and hot. 90 degrees Fahrenheit" } } } } ] } ] , # Remainder of request... Learn more about limitations and usage of thought signatures, and about thinking models in general, on the Thinking page. Parallel function calling In addition to single turn function calling, you can also call multiple functions at once. Parallel function calling lets you execute multiple functions at once and is used when the functions are not dependent on each other. This is useful in scenarios like gathering data from multiple independent sources, such as retrieving customer details from different databases or checking inventory levels across various warehouses or performing multiple actions such as converting your apartment into a disco. Python power_disco_ball = { "name" : "power_disco_ball" , "description" : "Powers the spinning disco ball." , "parameters" : { "type" : "object" , "properties" : { "power" : { "type" : "boolean" , "description" : "Whether to turn the disco ball on or off." , } }, "required" : [ "power" ], }, } start_music = { "name" : "start_music" , "description" : "Play some music matching the specified parameters." , "parameters" : { "type" : "object" , "properties" : { "energetic" : { "type" : "boolean" , "description" : "Whether the music is energetic or not." , }, "loud" : { "type" : "boolean" , "description" : "Whether the music is loud or not." , }, }, "required" : [ "energetic" , "loud" ], }, } dim_lights = { "name" : "dim_lights" , "description" : "Dim the lights." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "number" , "description" : "The brightness of the lights, 0.0 is off, 1.0 is full." , } }, "required" : [ "brightness" ], }, } JavaScript import { Type } from '@google/genai' ; const powerDiscoBall = { name : 'power_disco_ball' , description : 'Powers the spinning disco ball.' , parameters : { type : Type . OBJECT , properties : { power : { type : Type . BOOLEAN , description : 'Whether to turn the disco ball on or off.' } }, required : [ 'power' ] } }; const startMusic = { name : 'start_music' , description : 'Play some music matching the specified parameters.' , parameters : { type : Type . OBJECT , properties : { energetic : { type : Type . BOOLEAN , description : 'Whether the music is energetic or not.' }, loud : { type : Type . BOOLEAN , description : 'Whether the music is loud or not.' } }, required : [ 'energetic' , 'loud' ] } }; const dimLights = { name : 'dim_lights' , description : 'Dim the lights.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'The brightness of the lights, 0.0 is off, 1.0 is full.' } }, required : [ 'brightness' ] } }; Configure the function calling mode to allow using all of the specified tools. To learn more, you can read about configuring function calling . Python from google import genai from google.genai import types # Configure the client and tools client = genai . Client () house_tools = [ types . Tool ( function_declarations = [ power_disco_ball , start_music , dim_lights ]) ] config = types . GenerateContentConfig ( tools = house_tools , automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ), # Force the model to call 'any' function, instead of chatting. tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = 'ANY' ) ), ) chat = client . chats . create ( model = "gemini-2.5-flash" , config = config ) response = chat . send_message ( "Turn this place into a party!" ) # Print out each of the function calls requested from this single call print ( "Example 1: Forced function calling" ) for fn in response . function_calls : args = ", " . join ( f " { key } = { val } " for key , val in fn . args . items ()) print ( f " { fn . name } ( { args } )" ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Set up function declarations const houseFns = [ powerDiscoBall , startMusic , dimLights ]; const config = { tools : [{ functionDeclarations : houseFns }], // Force the model to call 'any' function, instead of chatting. toolConfig : { functionCallingConfig : { mode : 'any' } } }; // Configure the client const ai = new GoogleGenAI ({}); // Create a chat session const chat = ai . chats . create ({ model : 'gemini-2.5-flash' , config : config }); const response = await chat . sendMessage ({ message : 'Turn this place into a party!' }); // Print out each of the function calls requested from this single call console . log ( "Example 1: Forced function calling" ); for ( const fn of response . functionCalls ) { const args = Object . entries ( fn . args ) . map (([ key , val ]) = > ` ${ key } = ${ val } ` ) . join ( ', ' ); console . log ( ` ${ fn . name } ( ${ args } )` ); } Each of the printed results reflects a single function call that the model has requested. To send the results back, include the responses in the same order as they were requested. The Python SDK supports automatic function calling , which automatically converts Python functions to declarations, handles the function call execution and response cycle for you. Following is an example for the disco use case. Note: Automatic Function Calling is a Python SDK only feature at the moment. Python from google import genai from google.genai import types # Actual function implementations def power_disco_ball_impl ( power : bool ) - > dict : """Powers the spinning disco ball. Args: power: Whether to turn the disco ball on or off. Returns: A status dictionary indicating the current state. """ return { "status" : f "Disco ball powered { 'on' if power else 'off' } " } def start_music_impl ( energetic : bool , loud : bool ) - > dict : """Play some music matching the specified parameters. Args: energetic: Whether the music is energetic or not. loud: Whether the music is loud or not. Returns: A dictionary containing the music settings. """ music_type = "energetic" if energetic else "chill" volume = "loud" if loud else "quiet" return { "music_type" : music_type , "volume" : volume } def dim_lights_impl ( brightness : float ) - > dict : """Dim the lights. Args: brightness: The brightness of the lights, 0.0 is off, 1.0 is full. Returns: A dictionary containing the new brightness setting. """ return { "brightness" : brightness } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ power_disco_ball_impl , start_music_impl , dim_lights_impl ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Do everything you need to this place into party!" , config = config , ) print ( " \n Example 2: Automatic function calling" ) print ( response . text ) # I've turned on the disco ball, started playing loud and energetic music, and dimmed the lights to 50% brightness. Let's get this party started! Compositional function calling Compositional or sequential function calling allows Gemini to chain multiple function calls together to fulfill a complex request. For example, to answer "Get the temperature in my current location", the Gemini API might first invoke a get_current_location() function followed by a get_weather() function that takes the location as a parameter. The following example demonstrates how to implement compositional function calling using the Python SDK and automatic function calling. Python This example uses the automatic function calling feature of the google-genai Python SDK. The SDK automatically converts the Python functions to the required schema, executes the function calls when requested by the model, and sends the results back to the model to complete the task. import os from google import genai from google.genai import types # Example Functions def get_weather_forecast ( location : str ) - > dict : """Gets the current weather temperature for a given location.""" print ( f "Tool Call: get_weather_forecast(location= { location } )" ) # TODO: Make API call print ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ) return { "temperature" : 25 , "unit" : "celsius" } # Dummy response def set_thermostat_temperature ( temperature : int ) - > dict : """Sets the thermostat to a desired temperature.""" print ( f "Tool Call: set_thermostat_temperature(temperature= { temperature } )" ) # TODO: Interact with a thermostat API print ( "Tool Response: {'status': 'success'}" ) return { "status" : "success" } # Configure the client and model client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_weather_forecast , set_thermostat_temperature ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , config = config , ) # Print the final, user-facing response print ( response . text ) Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . I 've set the thermostat to 20°C. JavaScript This example shows how to use JavaScript/TypeScript SDK to do comopositional function calling using a manual execution loop. import { GoogleGenAI , Type } from "@google/genai" ; // Configure the client const ai = new GoogleGenAI ({}); // Example Functions function get_weather_forecast ({ location }) { console . log ( `Tool Call: get_weather_forecast(location= ${ location } )` ); // TODO: Make API call console . log ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ); return { temperature : 25 , unit : "celsius" }; } function set_thermostat_temperature ({ temperature }) { console . log ( `Tool Call: set_thermostat_temperature(temperature= ${ temperature } )` , ); // TODO: Make API call console . log ( "Tool Response: {'status': 'success'}" ); return { status : "success" }; } const toolFunctions = { get_weather_forecast , set_thermostat_temperature , }; const tools = [ { functionDeclarations : [ { name : "get_weather_forecast" , description : "Gets the current weather temperature for a given location." , parameters : { type : Type . OBJECT , properties : { location : { type : Type . STRING , }, }, required : [ "location" ], }, }, { name : "set_thermostat_temperature" , description : "Sets the thermostat to a desired temperature." , parameters : { type : Type . OBJECT , properties : { temperature : { type : Type . NUMBER , }, }, required : [ "temperature" ], }, }, ], }, ]; // Prompt for the model let contents = [ { role : "user" , parts : [ { text : "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , }, ], }, ]; // Loop until the model has no more function calls to make while ( true ) { const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents , config : { tools }, }); if ( result . functionCalls && result . functionCalls . length > 0 ) { const functionCall = result . functionCalls [ 0 ]; const { name , args } = functionCall ; if ( ! toolFunctions [ name ]) { throw new Error ( `Unknown function call: ${ name } ` ); } // Call the function and get the response. const toolResponse = toolFunctions [ name ]( args ); const functionResponsePart = { name : functionCall . name , response : { result : toolResponse , }, }; // Send the function response back to the model. contents . push ({ role : "model" , parts : [ { functionCall : functionCall , }, ], }); contents . push ({ role : "user" , parts : [ { functionResponse : functionResponsePart , }, ], }); } else { // No more function calls, break the loop. console . log ( result . text ); break ; } } Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . It 's 25°C in London, so I' ve set the thermostat to 20 ° C . Compositional function calling is a native Live API feature. This means Live API can handle the function calling similar to the Python SDK. Python # Light control schemas turn_on_the_lights_schema = { 'name' : 'turn_on_the_lights' } turn_off_the_lights_schema = { 'name' : 'turn_off_the_lights' } prompt = """ Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? """ tools = [ { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} ] await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Light control schemas const turnOnTheLightsSchema = { name : 'turn_on_the_lights' }; const turnOffTheLightsSchema = { name : 'turn_off_the_lights' }; const prompt = ` Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? ` ; const tools = [ { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } ]; await run ( prompt , tools = tools , modality = "AUDIO" ) Function calling modes The Gemini API lets you control how the model uses the provided tools (function declarations). Specifically, you can set the mode within the. function_calling_config . AUTO (Default) : The model decides whether to generate a natural language response or suggest a function call based on the prompt and context. This is the most flexible mode and recommended for most scenarios. ANY : The model is constrained to always predict a function call and guarantees function schema adherence. If allowed_function_names is not specified, the model can choose from any of the provided function declarations. If allowed_function_names is provided as a list, the model can only choose from the functions in that list. Use this mode when you require a function call response to every prompt (if applicable). NONE : The model is prohibited from making function calls. This is equivalent to sending a request without any function declarations. Use this to temporarily disable function calling without removing your tool definitions. Python from google.genai import types # Configure function calling mode tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = "ANY" , allowed_function_names = [ "get_current_temperature" ] ) ) # Create the generation config config = types . GenerateContentConfig ( tools = [ tools ], # not defined here. tool_config = tool_config , ) JavaScript import { FunctionCallingConfigMode } from '@google/genai' ; // Configure function calling mode const toolConfig = { functionCallingConfig : { mode : FunctionCallingConfigMode . ANY , allowedFunctionNames : [ 'get_current_temperature' ] } }; // Create the generation config const config = { tools : tools , // not defined here. toolConfig : toolConfig , }; Automatic function calling (Python only) When using the Python SDK, you can provide Python functions directly as tools. The SDK automatically converts the Python function to declarations, handles the function call execution and the response cycle for you. The Python SDK then automatically: Detects function call responses from the model. Call the corresponding Python function in your code. Sends the function response back to the model. Returns the model's final text response. To use this, define your function with type hints and a docstring, and then pass the function itself (not a JSON declaration) as a tool: Python from google import genai from google.genai import types # Define the function with type hints and docstring def get_current_temperature ( location : str ) - > dict : """Gets the current temperature for a given location. Args: location: The city and state, e.g. San Francisco, CA Returns: A dictionary containing the temperature and unit. """ # ... (implementation) ... return { "temperature" : 25 , "unit" : "Celsius" } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_current_temperature ] ) # Pass the function itself # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "What's the temperature in Boston?" , config = config , ) print ( response . text ) # The SDK handles the function call and returns the final text You can disable automatic function calling with: Python config = types . GenerateContentConfig ( tools = [ get_current_temperature ], automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ) ) Automatic function schema declaration Automatic schema extraction from Python functions doesn't work in all cases. For example, it doesn't handle cases where you describe the fields of a nested dictionary-object. The API is able to describe any of the following types: Python AllowedType = ( int | float | bool | str | list [ 'AllowedType' ] | dict [ str , AllowedType ]) To see what the inferred schema looks like, you can convert it using from_callable : Python def multiply ( a : float , b : float ): """Returns a * b.""" return a * b fn_decl = types . FunctionDeclaration . from_callable ( callable = multiply , client = client ) # to_json_dict() provides a clean JSON representation. print ( fn_decl . to_json_dict ()) Multi-tool use: Combine native tools with function calling You can enable multiple tools combining native tools with function calling at the same time. Here's an example that enables two tools, Grounding with Google Search and code execution , in a request using the Live API . Note: Multi-tool use is a- Live API only feature at the moment. The run() function declaration, which handles the asynchronous websocket setup, is omitted for brevity. Python # Multiple tasks example - combining lights, code execution, and search prompt = """ Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! """ tools = [ { 'google_search' : {}}, { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} # not defined here. ] # Execute the prompt with specified tools in audio modality await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Multiple tasks example - combining lights, code execution, and search const prompt = ` Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! ` ; const tools = [ { googleSearch : {} }, { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } // not defined here. ]; // Execute the prompt with specified tools in audio modality await run ( prompt , { tools : tools , modality : "AUDIO" }); Python developers can try this out in the Live API Tool Use notebook . Model context protocol (MCP) Model Context Protocol (MCP) is an open standard for connecting AI applications with external tools and data. MCP provides a common protocol for models to access context, such as functions (tools), data sources (resources), or predefined prompts. The Gemini SDKs have built-in support for the MCP, reducing boilerplate code and offering automatic tool calling for MCP tools. When the model generates an MCP tool call, the Python and JavaScript client SDK can automatically execute the MCP tool and send the response back to the model in a subsequent request, continuing this loop until no more tool calls are made by the model. Here, you can find an example of how to use a local MCP server with Gemini and mcp SDK. Python Make sure the latest version of the mcp SDK is installed on your platform of choice. pip install mcp Note: Python supports automatic tool calling by passing in the ClientSession into the tools parameters. If you want to disable it, you can provide automatic_function_calling with disabled True . import os import asyncio from datetime import datetime from mcp import ClientSession , StdioServerParameters from mcp.client.stdio import stdio_client from google import genai client = genai . Client () # Create server parameters for stdio connection server_params = StdioServerParameters ( command = "npx" , # Executable args = [ "-y" , "@philschmid/weather-mcp" ], # MCP Server env = None , # Optional environment variables ) async def run (): async with stdio_client ( server_params ) as ( read , write ): async with ClientSession ( read , write ) as session : # Prompt to get the weather for the current day in London. prompt = f "What is the weather in London in { datetime . now () . strftime ( '%Y-%m- %d ' ) } ?" # Initialize the connection between client and server await session . initialize () # Send request to the model with MCP function declarations response = await client . aio . models . generate_content ( model = "gemini-2.5-flash" , contents = prompt , config = genai . types . GenerateContentConfig ( temperature = 0 , tools = [ session ], # uses the session, will automatically call the tool # Uncomment if you **don't** want the SDK to automatically call the tool # automatic_function_calling=genai.types.AutomaticFunctionCallingConfig( # disable=True # ), ), ) print ( response . text ) # Start the asyncio event loop and run the main function asyncio . run ( run ()) JavaScript Make sure the latest version of the mcp SDK is installed on your platform of choice. npm install @modelcontextprotocol/sdk Note: JavaScript supports automatic tool calling by wrapping the client with mcpToTool . If you want to disable it, you can provide automaticFunctionCalling with disabled true . import { GoogleGenAI , FunctionCallingConfigMode , mcpToTool } from '@google/genai' ; import { Client } from "@modelcontextprotocol/sdk/client/index.js" ; import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js" ; // Create server parameters for stdio connection const serverParams = new StdioClientTransport ({ command : "npx" , // Executable args : [ "-y" , "@philschmid/weather-mcp" ] // MCP Server }); const client = new Client ( { name : "example-client" , version : "1.0.0" } ); // Configure the client const ai = new GoogleGenAI ({}); // Initialize the connection between client and server await client . connect ( serverParams ); // Send request to the model with MCP tools const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : `What is the weather in London in ${ new Date (). toLocaleDateString () } ?` , config : { tools : [ mcpToTool ( client )], // uses the session, will automatically call the tool // Uncomment if you **don't** want the sdk to automatically call the tool // automaticFunctionCalling: { // disable: true, // }, }, }); console . log ( response . text ) // Close the connection await client . close (); Limitations with built-in MCP support Built-in MCP support is a experimental feature in our SDKs and has the following limitations: Only tools are supported, not resources nor prompts It is available for the Python and JavaScript/TypeScript SDK. Breaking changes might occur in future releases. Manual integration of MCP servers is always an option if these limit what you're building. Supported models This section lists models and their function calling capabilities. Experimental models are not included. You can find a comprehensive capabilities overview on the model overview page. Model Function Calling Parallel Function Calling Compositional Function Calling Gemini 2.5 Pro ✔️ ✔️ ✔️ Gemini 2.5 Flash ✔️ ✔️ ✔️ Gemini 2.5 Flash-Lite ✔️ ✔️ ✔️ Gemini 2.0 Flash ✔️ ✔️ ✔️ Gemini 2.0 Flash-Lite X X X Best practices Function and Parameter Descriptions: Be extremely clear and specific in your descriptions. The model relies on these to choose the correct function and provide appropriate arguments. Naming: Use descriptive function names (without spaces, periods, or dashes). Strong Typing: Use specific types (integer, string, enum) for parameters to reduce errors. If a parameter has a limited set of valid values, use an enum. Tool Selection: While the model can use an arbitrary number of tools, providing too many can increase the risk of selecting an incorrect or suboptimal tool. For best results, aim to provide only the relevant tools for the context or task, ideally keeping the active set to a maximum of 10-20. Consider dynamic tool selection based on conversation context if you have a large total number of tools. Prompt Engineering: Provide context: Tell the model its role (e.g., "You are a helpful weather assistant."). Give instructions: Specify how and when to use functions (e.g., "Don't guess dates; always use a future date for forecasts."). Encourage clarification: Instruct the model to ask clarifying questions if needed. Temperature: Use a low temperature (e.g., 0) for more deterministic and reliable function calls. Validation: If a function call has significant consequences (e.g., placing an order), validate the call with the user before executing it. Error Handling : Implement robust error handling in your functions to gracefully handle unexpected inputs or API failures. Return informative error messages that the model can use to generate helpful responses to the user. Security: Be mindful of security when calling external APIs. Use appropriate authentication and authorization mechanisms. Avoid exposing sensitive data in function calls. Token Limits: Function descriptions and parameters count towards your input token limit. If you're hitting token limits, consider limiting the number of functions or the length of the descriptions, break down complex tasks into smaller, more focused function sets. Notes and limitations Only a subset of the OpenAPI schema is supported. Supported parameter types in Python are limited. Automatic function calling is a Python SDK feature only. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-10 UTC.
 
2
  Title: Function calling with the Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Function calling with the Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Function calling with the Gemini API Function calling lets you connect models to external tools and APIs. Instead of generating text responses, the model determines when to call specific functions and provides the necessary parameters to execute real-world actions. This allows the model to act as a bridge between natural language and real-world actions and data. Function calling has 3 primary use cases: Augment Knowledge: Access information from external sources like databases, APIs, and knowledge bases. Extend Capabilities: Use external tools to perform computations and extend the limitations of the model, such as using a calculator or creating charts. Take Actions: Interact with external systems using APIs, such as scheduling appointments, creating invoices, sending emails, or controlling smart home devices. Get Weather Schedule Meeting Create Chart Python from google import genai from google.genai import types # Define the function declaration for the model weather_function = { "name" : "get_current_temperature" , "description" : "Gets the current temperature for a given location." , "parameters" : { "type" : "object" , "properties" : { "location" : { "type" : "string" , "description" : "The city name, e.g. San Francisco" , }, }, "required" : [ "location" ], }, } # Configure the client and tools client = genai . Client () tools = types . Tool ( function_declarations = [ weather_function ]) config = types . GenerateContentConfig ( tools = [ tools ]) # Send request with function declarations response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "What's the temperature in London?" , config = config , ) # Check for a function call if response . candidates [ 0 ] . content . parts [ 0 ] . function_call : function_call = response . candidates [ 0 ] . content . parts [ 0 ] . function_call print ( f "Function to call: { function_call . name } " ) print ( f "Arguments: { function_call . args } " ) # In a real app, you would call your function here: # result = get_current_temperature(**function_call.args) else : print ( "No function call found in the response." ) print ( response . text ) JavaScript import { GoogleGenAI , Type } from '@google/genai' ; // Configure the client const ai = new GoogleGenAI ({}); // Define the function declaration for the model const weatherFunctionDeclaration = { name : 'get_current_temperature' , description : 'Gets the current temperature for a given location.' , parameters : { type : Type . OBJECT , properties : { location : { type : Type . STRING , description : 'The city name, e.g. San Francisco' , }, }, required : [ 'location' ], }, }; // Send request with function declarations const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : "What's the temperature in London?" , config : { tools : [{ functionDeclarations : [ weatherFunctionDeclaration ] }], }, }); // Check for function calls in the response if ( response . functionCalls && response . functionCalls . length > 0 ) { const functionCall = response . functionCalls [ 0 ]; // Assuming one function call console . log ( `Function to call: ${ functionCall . name } ` ); console . log ( `Arguments: ${ JSON . stringify ( functionCall . args ) } ` ); // In a real app, you would call your actual function here: // const result = await getCurrentTemperature(functionCall.args); } else { console . log ( "No function call found in the response." ); console . log ( response . text ); } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [ { "role": "user", "parts": [ { "text": "What' \' 's the temperature in London?" } ] } ], "tools": [ { "functionDeclarations": [ { "name": "get_current_temperature", "description": "Gets the current temperature for a given location.", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city name, e.g. San Francisco" } }, "required": ["location"] } } ] } ] }' How function calling works Function calling involves a structured interaction between your application, the model, and external functions. Here's a breakdown of the process: Define Function Declaration: Define the function declaration in your application code. Function Declarations describe the function's name, parameters, and purpose to the model. Call LLM with function declarations: Send user prompt along with the function declaration(s) to the model. It analyzes the request and determines if a function call would be helpful. If so, it responds with a structured JSON object. Execute Function Code (Your Responsibility): The Model does not execute the function itself. It's your application's responsibility to process the response and check for Function Call, if Yes : Extract the name and args of the function and execute the corresponding function in your application. No: The model has provided a direct text response to the prompt (this flow is less emphasized in the example but is a possible outcome). Create User friendly response: If a function was executed, capture the result and send it back to the model in a subsequent turn of the conversation. It will use the result to generate a final, user-friendly response that incorporates the information from the function call. This process can be repeated over multiple turns, allowing for complex interactions and workflows. The model also supports calling multiple functions in a single turn ( parallel function calling ) and in sequence ( compositional function calling ). Step 1: Define a function declaration Define a function and its declaration within your application code that allows users to set light values and make an API request. This function could call external services or APIs. Python # Define a function that the model can call to control smart lights set_light_values_declaration = { "name" : "set_light_values" , "description" : "Sets the brightness and color temperature of a light." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "integer" , "description" : "Light level from 0 to 100. Zero is off and 100 is full brightness" , }, "color_temp" : { "type" : "string" , "enum" : [ "daylight" , "cool" , "warm" ], "description" : "Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`." , }, }, "required" : [ "brightness" , "color_temp" ], }, } # This is the actual function that would be called based on the model's suggestion def set_light_values ( brightness : int , color_temp : str ) - > dict [ str , int | str ]: """Set the brightness and color temperature of a room light. (mock API). Args: brightness: Light level from 0 to 100. Zero is off and 100 is full brightness color_temp: Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. Returns: A dictionary containing the set brightness and color temperature. """ return { "brightness" : brightness , "colorTemperature" : color_temp } JavaScript import { Type } from '@google/genai' ; // Define a function that the model can call to control smart lights const setLightValuesFunctionDeclaration = { name : 'set_light_values' , description : 'Sets the brightness and color temperature of a light.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'Light level from 0 to 100. Zero is off and 100 is full brightness' , }, color_temp : { type : Type . STRING , enum : [ 'daylight' , 'cool' , 'warm' ], description : 'Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`.' , }, }, required : [ 'brightness' , 'color_temp' ], }, }; /** * Set the brightness and color temperature of a room light. (mock API) * @param {number} brightness - Light level from 0 to 100. Zero is off and 100 is full brightness * @param {string} color_temp - Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. * @return {Object} A dictionary containing the set brightness and color temperature. */ function setLightValues ( brightness , color_temp ) { return { brightness : brightness , colorTemperature : color_temp }; } Step 2: Call the model with function declarations Once you have defined your function declarations, you can prompt the model to use them. It analyzes the prompt and function declarations and decides whether to respond directly or to call a function. If a function is called, the response object will contain a function call suggestion. Python from google.genai import types # Configure the client and tools client = genai . Client () tools = types . Tool ( function_declarations = [ set_light_values_declaration ]) config = types . GenerateContentConfig ( tools = [ tools ]) # Define user prompt contents = [ types . Content ( role = "user" , parts = [ types . Part ( text = "Turn the lights down to a romantic level" )] ) ] # Send request with function declarations response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = contents config = config , ) print ( response . candidates [ 0 ] . content . parts [ 0 ] . function_call ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Generation config with function declaration const config = { tools : [{ functionDeclarations : [ setLightValuesFunctionDeclaration ] }] }; // Configure the client const ai = new GoogleGenAI ({}); // Define user prompt const contents = [ { role : 'user' , parts : [{ text : 'Turn the lights down to a romantic level' }] } ]; // Send request with function declarations const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( response . functionCalls [ 0 ]); The model then returns a functionCall object in an OpenAPI compatible schema specifying how to call one or more of the declared functions in order to respond to the user's question. Python id = None args = { 'color_temp' : 'warm' , 'brightness' : 25 } name = 'set_light_values' JavaScript { name : 'set_light_values' , args : { brightness : 25 , color_temp : 'warm' } } Step 3: Execute set_light_values function code Extract the function call details from the model's response, parse the arguments , and execute the set_light_values function. Python # Extract tool call details, it may not be in the first part. tool_call = response . candidates [ 0 ] . content . parts [ 0 ] . function_call if tool_call . name == "set_light_values" : result = set_light_values ( ** tool_call . args ) print ( f "Function execution result: { result } " ) JavaScript // Extract tool call details const tool_call = response . functionCalls [ 0 ] let result ; if ( tool_call . name === 'set_light_values' ) { result = setLightValues ( tool_call . args . brightness , tool_call . args . color_temp ); console . log ( `Function execution result: ${ JSON . stringify ( result ) } ` ); } Step 4: Create user friendly response with function result and call the model again Finally, send the result of the function execution back to the model so it can incorporate this information into its final response to the user. Python # Create a function response part function_response_part = types . Part . from_function_response ( name = tool_call . name , response = { "result" : result }, ) # Append function call and result of the function execution to contents contents . append ( response . candidates [ 0 ] . content ) # Append the content from the model's response. contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Create a function response part const function_response_part = { name : tool_call . name , response : { result } } // Append function call and result of the function execution to contents contents . push ( response . candidates [ 0 ]. content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); // Get the final response from the model const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); This completes the function calling flow. The model successfully used the set_light_values function to perform the request action of the user. Function declarations When you implement function calling in a prompt, you create a tools object, which contains one or more function declarations . You define functions using JSON, specifically with a select subset of the OpenAPI schema format. A single function declaration can include the following parameters: name (string): A unique name for the function ( get_weather_forecast , send_email ). Use descriptive names without spaces or special characters (use underscores or camelCase). description (string): A clear and detailed explanation of the function's purpose and capabilities. This is crucial for the model to understand when to use the function. Be specific and provide examples if helpful ("Finds theaters based on location and optionally movie title which is currently playing in theaters."). parameters (object): Defines the input parameters the function expects. type (string): Specifies the overall data type, such as object . properties (object): Lists individual parameters, each with: type (string): The data type of the parameter, such as string , integer , boolean, array . description (string): A description of the parameter's purpose and format. Provide examples and constraints ("The city and state, e.g., 'San Francisco, CA' or a zip code e.g., '95616'."). enum (array, optional): If the parameter values are from a fixed set, use "enum" to list the allowed values instead of just describing them in the description. This improves accuracy ("enum": ["daylight", "cool", "warm"]). required (array): An array of strings listing the parameter names that are mandatory for the function to operate. You can also construct FunctionDeclarations from Python functions directly using types.FunctionDeclaration.from_callable(client=client, callable=your_function) . Function calling with thinking Enabling " thinking " can improve function call performance by allowing the model to reason through a request before suggesting function calls. The Gemini API is stateless, the model's reasoning context will be lost between turns in a multi-turn conversation. To preserve this context, you can use thought signatures. A thought signature is an encrypted representation of the model's internal thought process that you pass back to the model on subsequent turns. The standard pattern for multi-turn tool use is to append the model's complete previous response to the conversation history. The content object includes the thought_signatures automatically. If you follow this pattern No code changes are required . Manually managing thought signatures If you modify the conversation history manually—instead of sending the complete previous response and want to benefit from thinking you must correctly handle the thought_signature included in the model's turn. Follow these rules to ensure the model's context is preserved: Always send the thought_signature back to the model inside its original Part . Don't merge a Part containing a signature with one that does not. This breaks the positional context of the thought. Don't combine two Parts that both contain signatures, as the signature strings cannot be merged. Inspecting Thought Signatures While not necessary for implementation, you can inspect the response to see the thought_signature for debugging or educational purposes. Python import base64 # After receiving a response from a model with thinking enabled # response = client.models.generate_content(...) # The signature is attached to the response part containing the function call part = response . candidates [ 0 ] . content . parts [ 0 ] if part . thought_signature : print ( base64 . b64encode ( part . thought_signature ) . decode ( "utf-8" )) JavaScript // After receiving a response from a model with thinking enabled // const response = await ai.models.generateContent(...) // The signature is attached to the response part containing the function call const part = response . candidates [ 0 ]. content . parts [ 0 ]; if ( part . thoughtSignature ) { console . log ( part . thoughtSignature ); } Learn more about limitations and usage of thought signatures, and about thinking models in general, on the Thinking page. Parallel function calling In addition to single turn function calling, you can also call multiple functions at once. Parallel function calling lets you execute multiple functions at once and is used when the functions are not dependent on each other. This is useful in scenarios like gathering data from multiple independent sources, such as retrieving customer details from different databases or checking inventory levels across various warehouses or performing multiple actions such as converting your apartment into a disco. Python power_disco_ball = { "name" : "power_disco_ball" , "description" : "Powers the spinning disco ball." , "parameters" : { "type" : "object" , "properties" : { "power" : { "type" : "boolean" , "description" : "Whether to turn the disco ball on or off." , } }, "required" : [ "power" ], }, } start_music = { "name" : "start_music" , "description" : "Play some music matching the specified parameters." , "parameters" : { "type" : "object" , "properties" : { "energetic" : { "type" : "boolean" , "description" : "Whether the music is energetic or not." , }, "loud" : { "type" : "boolean" , "description" : "Whether the music is loud or not." , }, }, "required" : [ "energetic" , "loud" ], }, } dim_lights = { "name" : "dim_lights" , "description" : "Dim the lights." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "number" , "description" : "The brightness of the lights, 0.0 is off, 1.0 is full." , } }, "required" : [ "brightness" ], }, } JavaScript import { Type } from '@google/genai' ; const powerDiscoBall = { name : 'power_disco_ball' , description : 'Powers the spinning disco ball.' , parameters : { type : Type . OBJECT , properties : { power : { type : Type . BOOLEAN , description : 'Whether to turn the disco ball on or off.' } }, required : [ 'power' ] } }; const startMusic = { name : 'start_music' , description : 'Play some music matching the specified parameters.' , parameters : { type : Type . OBJECT , properties : { energetic : { type : Type . BOOLEAN , description : 'Whether the music is energetic or not.' }, loud : { type : Type . BOOLEAN , description : 'Whether the music is loud or not.' } }, required : [ 'energetic' , 'loud' ] } }; const dimLights = { name : 'dim_lights' , description : 'Dim the lights.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'The brightness of the lights, 0.0 is off, 1.0 is full.' } }, required : [ 'brightness' ] } }; Configure the function calling mode to allow using all of the specified tools. To learn more, you can read about configuring function calling . Python from google import genai from google.genai import types # Configure the client and tools client = genai . Client () house_tools = [ types . Tool ( function_declarations = [ power_disco_ball , start_music , dim_lights ]) ] config = types . GenerateContentConfig ( tools = house_tools , automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ), # Force the model to call 'any' function, instead of chatting. tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = 'ANY' ) ), ) chat = client . chats . create ( model = "gemini-2.5-flash" , config = config ) response = chat . send_message ( "Turn this place into a party!" ) # Print out each of the function calls requested from this single call print ( "Example 1: Forced function calling" ) for fn in response . function_calls : args = ", " . join ( f " { key } = { val } " for key , val in fn . args . items ()) print ( f " { fn . name } ( { args } )" ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Set up function declarations const houseFns = [ powerDiscoBall , startMusic , dimLights ]; const config = { tools : [{ functionDeclarations : houseFns }], // Force the model to call 'any' function, instead of chatting. toolConfig : { functionCallingConfig : { mode : 'any' } } }; // Configure the client const ai = new GoogleGenAI ({}); // Create a chat session const chat = ai . chats . create ({ model : 'gemini-2.5-flash' , config : config }); const response = await chat . sendMessage ({ message : 'Turn this place into a party!' }); // Print out each of the function calls requested from this single call console . log ( "Example 1: Forced function calling" ); for ( const fn of response . functionCalls ) { const args = Object . entries ( fn . args ) . map (([ key , val ]) = > ` ${ key } = ${ val } ` ) . join ( ', ' ); console . log ( ` ${ fn . name } ( ${ args } )` ); } Each of the printed results reflects a single function call that the model has requested. To send the results back, include the responses in the same order as they were requested. The Python SDK supports automatic function calling , which automatically converts Python functions to declarations, handles the function call execution and response cycle for you. Following is an example for the disco use case. Note: Automatic Function Calling is a Python SDK only feature at the moment. Python from google import genai from google.genai import types # Actual function implementations def power_disco_ball_impl ( power : bool ) - > dict : """Powers the spinning disco ball. Args: power: Whether to turn the disco ball on or off. Returns: A status dictionary indicating the current state. """ return { "status" : f "Disco ball powered { 'on' if power else 'off' } " } def start_music_impl ( energetic : bool , loud : bool ) - > dict : """Play some music matching the specified parameters. Args: energetic: Whether the music is energetic or not. loud: Whether the music is loud or not. Returns: A dictionary containing the music settings. """ music_type = "energetic" if energetic else "chill" volume = "loud" if loud else "quiet" return { "music_type" : music_type , "volume" : volume } def dim_lights_impl ( brightness : float ) - > dict : """Dim the lights. Args: brightness: The brightness of the lights, 0.0 is off, 1.0 is full. Returns: A dictionary containing the new brightness setting. """ return { "brightness" : brightness } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ power_disco_ball_impl , start_music_impl , dim_lights_impl ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Do everything you need to this place into party!" , config = config , ) print ( " \n Example 2: Automatic function calling" ) print ( response . text ) # I've turned on the disco ball, started playing loud and energetic music, and dimmed the lights to 50% brightness. Let's get this party started! Compositional function calling Compositional or sequential function calling allows Gemini to chain multiple function calls together to fulfill a complex request. For example, to answer "Get the temperature in my current location", the Gemini API might first invoke a get_current_location() function followed by a get_weather() function that takes the location as a parameter. The following example demonstrates how to implement compositional function calling using the Python SDK and automatic function calling. Python This example uses the automatic function calling feature of the google-genai Python SDK. The SDK automatically converts the Python functions to the required schema, executes the function calls when requested by the model, and sends the results back to the model to complete the task. import os from google import genai from google.genai import types # Example Functions def get_weather_forecast ( location : str ) - > dict : """Gets the current weather temperature for a given location.""" print ( f "Tool Call: get_weather_forecast(location= { location } )" ) # TODO: Make API call print ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ) return { "temperature" : 25 , "unit" : "celsius" } # Dummy response def set_thermostat_temperature ( temperature : int ) - > dict : """Sets the thermostat to a desired temperature.""" print ( f "Tool Call: set_thermostat_temperature(temperature= { temperature } )" ) # TODO: Interact with a thermostat API print ( "Tool Response: {'status': 'success'}" ) return { "status" : "success" } # Configure the client and model client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_weather_forecast , set_thermostat_temperature ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , config = config , ) # Print the final, user-facing response print ( response . text ) Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . I 've set the thermostat to 20°C. JavaScript This example shows how to use JavaScript/TypeScript SDK to do comopositional function calling using a manual execution loop. import { GoogleGenAI , Type } from "@google/genai" ; // Configure the client const ai = new GoogleGenAI ({}); // Example Functions function get_weather_forecast ({ location }) { console . log ( `Tool Call: get_weather_forecast(location= ${ location } )` ); // TODO: Make API call console . log ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ); return { temperature : 25 , unit : "celsius" }; } function set_thermostat_temperature ({ temperature }) { console . log ( `Tool Call: set_thermostat_temperature(temperature= ${ temperature } )` , ); // TODO: Make API call console . log ( "Tool Response: {'status': 'success'}" ); return { status : "success" }; } const toolFunctions = { get_weather_forecast , set_thermostat_temperature , }; const tools = [ { functionDeclarations : [ { name : "get_weather_forecast" , description : "Gets the current weather temperature for a given location." , parameters : { type : Type . OBJECT , properties : { location : { type : Type . STRING , }, }, required : [ "location" ], }, }, { name : "set_thermostat_temperature" , description : "Sets the thermostat to a desired temperature." , parameters : { type : Type . OBJECT , properties : { temperature : { type : Type . NUMBER , }, }, required : [ "temperature" ], }, }, ], }, ]; // Prompt for the model let contents = [ { role : "user" , parts : [ { text : "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , }, ], }, ]; // Loop until the model has no more function calls to make while ( true ) { const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents , config : { tools }, }); if ( result . functionCalls && result . functionCalls . length > 0 ) { const functionCall = result . functionCalls [ 0 ]; const { name , args } = functionCall ; if ( ! toolFunctions [ name ]) { throw new Error ( `Unknown function call: ${ name } ` ); } // Call the function and get the response. const toolResponse = toolFunctions [ name ]( args ); const functionResponsePart = { name : functionCall . name , response : { result : toolResponse , }, }; // Send the function response back to the model. contents . push ({ role : "model" , parts : [ { functionCall : functionCall , }, ], }); contents . push ({ role : "user" , parts : [ { functionResponse : functionResponsePart , }, ], }); } else { // No more function calls, break the loop. console . log ( result . text ); break ; } } Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . It 's 25°C in London, so I' ve set the thermostat to 20 ° C . Compositional function calling is a native Live API feature. This means Live API can handle the function calling similar to the Python SDK. Python # Light control schemas turn_on_the_lights_schema = { 'name' : 'turn_on_the_lights' } turn_off_the_lights_schema = { 'name' : 'turn_off_the_lights' } prompt = """ Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? """ tools = [ { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} ] await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Light control schemas const turnOnTheLightsSchema = { name : 'turn_on_the_lights' }; const turnOffTheLightsSchema = { name : 'turn_off_the_lights' }; const prompt = ` Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? ` ; const tools = [ { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } ]; await run ( prompt , tools = tools , modality = "AUDIO" ) Function calling modes The Gemini API lets you control how the model uses the provided tools (function declarations). Specifically, you can set the mode within the. function_calling_config . AUTO (Default) : The model decides whether to generate a natural language response or suggest a function call based on the prompt and context. This is the most flexible mode and recommended for most scenarios. ANY : The model is constrained to always predict a function call and guarantees function schema adherence. If allowed_function_names is not specified, the model can choose from any of the provided function declarations. If allowed_function_names is provided as a list, the model can only choose from the functions in that list. Use this mode when you require a function call response to every prompt (if applicable). NONE : The model is prohibited from making function calls. This is equivalent to sending a request without any function declarations. Use this to temporarily disable function calling without removing your tool definitions. Python from google.genai import types # Configure function calling mode tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = "ANY" , allowed_function_names = [ "get_current_temperature" ] ) ) # Create the generation config config = types . GenerateContentConfig ( tools = [ tools ], # not defined here. tool_config = tool_config , ) JavaScript import { FunctionCallingConfigMode } from '@google/genai' ; // Configure function calling mode const toolConfig = { functionCallingConfig : { mode : FunctionCallingConfigMode . ANY , allowedFunctionNames : [ 'get_current_temperature' ] } }; // Create the generation config const config = { tools : tools , // not defined here. toolConfig : toolConfig , }; Automatic function calling (Python only) When using the Python SDK, you can provide Python functions directly as tools. The SDK converts these functions into declarations, manages the function call execution, and handles the response cycle for you. Define your function with type hints and a docstring. For optimal results, it is recommended to use Google-style docstrings. The SDK will then automatically: Detect function call responses from the model. Call the corresponding Python function in your code. Send the function's response back to the model. Return the model's final text response. The SDK currently does not parse argument descriptions into the property description slots of the generated function declaration. Instead, it sends the entire docstring as the top-level function description. Python from google import genai from google.genai import types # Define the function with type hints and docstring def get_current_temperature ( location : str ) - > dict : """Gets the current temperature for a given location. Args: location: The city and state, e.g. San Francisco, CA Returns: A dictionary containing the temperature and unit. """ # ... (implementation) ... return { "temperature" : 25 , "unit" : "Celsius" } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_current_temperature ] ) # Pass the function itself # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "What's the temperature in Boston?" , config = config , ) print ( response . text ) # The SDK handles the function call and returns the final text You can disable automatic function calling with: Python config = types . GenerateContentConfig ( tools = [ get_current_temperature ], automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ) ) Automatic function schema declaration The API is able to describe any of the following types. Pydantic types are allowed, as long as the fields defined on them are also composed of allowed types. Dict types (like dict[str: int] ) are not well supported here, don't use them. Python AllowedType = ( int | float | bool | str | list [ 'AllowedType' ] | pydantic . BaseModel ) To see what the inferred schema looks like, you can convert it using from_callable : Python def multiply ( a : float , b : float ): """Returns a * b.""" return a * b fn_decl = types . FunctionDeclaration . from_callable ( callable = multiply , client = client ) # to_json_dict() provides a clean JSON representation. print ( fn_decl . to_json_dict ()) Multi-tool use: Combine native tools with function calling You can enable multiple tools combining native tools with function calling at the same time. Here's an example that enables two tools, Grounding with Google Search and code execution , in a request using the Live API . Note: Multi-tool use is a- Live API only feature at the moment. The run() function declaration, which handles the asynchronous websocket setup, is omitted for brevity. Python # Multiple tasks example - combining lights, code execution, and search prompt = """ Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! """ tools = [ { 'google_search' : {}}, { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} # not defined here. ] # Execute the prompt with specified tools in audio modality await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Multiple tasks example - combining lights, code execution, and search const prompt = ` Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! ` ; const tools = [ { googleSearch : {} }, { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } // not defined here. ]; // Execute the prompt with specified tools in audio modality await run ( prompt , { tools : tools , modality : "AUDIO" }); Python developers can try this out in the Live API Tool Use notebook . Model context protocol (MCP) Model Context Protocol (MCP) is an open standard for connecting AI applications with external tools and data. MCP provides a common protocol for models to access context, such as functions (tools), data sources (resources), or predefined prompts. The Gemini SDKs have built-in support for the MCP, reducing boilerplate code and offering automatic tool calling for MCP tools. When the model generates an MCP tool call, the Python and JavaScript client SDK can automatically execute the MCP tool and send the response back to the model in a subsequent request, continuing this loop until no more tool calls are made by the model. Here, you can find an example of how to use a local MCP server with Gemini and mcp SDK. Python Make sure the latest version of the mcp SDK is installed on your platform of choice. pip install mcp Note: Python supports automatic tool calling by passing in the ClientSession into the tools parameters. If you want to disable it, you can provide automatic_function_calling with disabled True . import os import asyncio from datetime import datetime from mcp import ClientSession , StdioServerParameters from mcp.client.stdio import stdio_client from google import genai client = genai . Client () # Create server parameters for stdio connection server_params = StdioServerParameters ( command = "npx" , # Executable args = [ "-y" , "@philschmid/weather-mcp" ], # MCP Server env = None , # Optional environment variables ) async def run (): async with stdio_client ( server_params ) as ( read , write ): async with ClientSession ( read , write ) as session : # Prompt to get the weather for the current day in London. prompt = f "What is the weather in London in { datetime . now () . strftime ( '%Y-%m- %d ' ) } ?" # Initialize the connection between client and server await session . initialize () # Send request to the model with MCP function declarations response = await client . aio . models . generate_content ( model = "gemini-2.5-flash" , contents = prompt , config = genai . types . GenerateContentConfig ( temperature = 0 , tools = [ session ], # uses the session, will automatically call the tool # Uncomment if you **don't** want the SDK to automatically call the tool # automatic_function_calling=genai.types.AutomaticFunctionCallingConfig( # disable=True # ), ), ) print ( response . text ) # Start the asyncio event loop and run the main function asyncio . run ( run ()) JavaScript Make sure the latest version of the mcp SDK is installed on your platform of choice. npm install @modelcontextprotocol/sdk Note: JavaScript supports automatic tool calling by wrapping the client with mcpToTool . If you want to disable it, you can provide automaticFunctionCalling with disabled true . import { GoogleGenAI , FunctionCallingConfigMode , mcpToTool } from '@google/genai' ; import { Client } from "@modelcontextprotocol/sdk/client/index.js" ; import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js" ; // Create server parameters for stdio connection const serverParams = new StdioClientTransport ({ command : "npx" , // Executable args : [ "-y" , "@philschmid/weather-mcp" ] // MCP Server }); const client = new Client ( { name : "example-client" , version : "1.0.0" } ); // Configure the client const ai = new GoogleGenAI ({}); // Initialize the connection between client and server await client . connect ( serverParams ); // Send request to the model with MCP tools const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : `What is the weather in London in ${ new Date (). toLocaleDateString () } ?` , config : { tools : [ mcpToTool ( client )], // uses the session, will automatically call the tool // Uncomment if you **don't** want the sdk to automatically call the tool // automaticFunctionCalling: { // disable: true, // }, }, }); console . log ( response . text ) // Close the connection await client . close (); Limitations with built-in MCP support Built-in MCP support is a experimental feature in our SDKs and has the following limitations: Only tools are supported, not resources nor prompts It is available for the Python and JavaScript/TypeScript SDK. Breaking changes might occur in future releases. Manual integration of MCP servers is always an option if these limit what you're building. Supported models This section lists models and their function calling capabilities. Experimental models are not included. You can find a comprehensive capabilities overview on the model overview page. Model Function Calling Parallel Function Calling Compositional Function Calling Gemini 2.5 Pro ✔️ ✔️ ✔️ Gemini 2.5 Flash ✔️ ✔️ ✔️ Gemini 2.5 Flash-Lite ✔️ ✔️ ✔️ Gemini 2.0 Flash ✔️ ✔️ ✔️ Gemini 2.0 Flash-Lite X X X Best practices Function and Parameter Descriptions: Be extremely clear and specific in your descriptions. The model relies on these to choose the correct function and provide appropriate arguments. Naming: Use descriptive function names (without spaces, periods, or dashes). Strong Typing: Use specific types (integer, string, enum) for parameters to reduce errors. If a parameter has a limited set of valid values, use an enum. Tool Selection: While the model can use an arbitrary number of tools, providing too many can increase the risk of selecting an incorrect or suboptimal tool. For best results, aim to provide only the relevant tools for the context or task, ideally keeping the active set to a maximum of 10-20. Consider dynamic tool selection based on conversation context if you have a large total number of tools. Prompt Engineering: Provide context: Tell the model its role (e.g., "You are a helpful weather assistant."). Give instructions: Specify how and when to use functions (e.g., "Don't guess dates; always use a future date for forecasts."). Encourage clarification: Instruct the model to ask clarifying questions if needed. Temperature: Use a low temperature (e.g., 0) for more deterministic and reliable function calls. Validation: If a function call has significant consequences (e.g., placing an order), validate the call with the user before executing it. Error Handling : Implement robust error handling in your functions to gracefully handle unexpected inputs or API failures. Return informative error messages that the model can use to generate helpful responses to the user. Security: Be mindful of security when calling external APIs. Use appropriate authentication and authorization mechanisms. Avoid exposing sensitive data in function calls. Token Limits: Function descriptions and parameters count towards your input token limit. If you're hitting token limits, consider limiting the number of functions or the length of the descriptions, break down complex tasks into smaller, more focused function sets. Notes and limitations Only a subset of the OpenAPI schema is supported. Supported parameter types in Python are limited. Automatic function calling is a Python SDK feature only. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_function-calling_8330f2c7.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/function-calling#step-2
2
  Title: Function calling with the Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Function calling with the Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Function calling with the Gemini API Function calling lets you connect models to external tools and APIs. Instead of generating text responses, the model determines when to call specific functions and provides the necessary parameters to execute real-world actions. This allows the model to act as a bridge between natural language and real-world actions and data. Function calling has 3 primary use cases: Augment Knowledge: Access information from external sources like databases, APIs, and knowledge bases. Extend Capabilities: Use external tools to perform computations and extend the limitations of the model, such as using a calculator or creating charts. Take Actions: Interact with external systems using APIs, such as scheduling appointments, creating invoices, sending emails, or controlling smart home devices. Get Weather Schedule Meeting Create Chart How function calling works Function calling involves a structured interaction between your application, the model, and external functions. Here's a breakdown of the process: Define Function Declaration: Define the function declaration in your application code. Function Declarations describe the function's name, parameters, and purpose to the model. Call LLM with function declarations: Send user prompt along with the function declaration(s) to the model. It analyzes the request and determines if a function call would be helpful. If so, it responds with a structured JSON object. Execute Function Code (Your Responsibility): The Model does not execute the function itself. It's your application's responsibility to process the response and check for Function Call, if Yes : Extract the name and args of the function and execute the corresponding function in your application. No: The model has provided a direct text response to the prompt (this flow is less emphasized in the example but is a possible outcome). Create User friendly response: If a function was executed, capture the result and send it back to the model in a subsequent turn of the conversation. It will use the result to generate a final, user-friendly response that incorporates the information from the function call. This process can be repeated over multiple turns, allowing for complex interactions and workflows. The model also supports calling multiple functions in a single turn ( parallel function calling ) and in sequence ( compositional function calling ). Step 1: Define a function declaration Define a function and its declaration within your application code that allows users to set light values and make an API request. This function could call external services or APIs. Python # Define a function that the model can call to control smart lights set_light_values_declaration = { "name" : "set_light_values" , "description" : "Sets the brightness and color temperature of a light." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "integer" , "description" : "Light level from 0 to 100. Zero is off and 100 is full brightness" , }, "color_temp" : { "type" : "string" , "enum" : [ "daylight" , "cool" , "warm" ], "description" : "Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`." , }, }, "required" : [ "brightness" , "color_temp" ], }, } # This is the actual function that would be called based on the model's suggestion def set_light_values ( brightness : int , color_temp : str ) - > dict [ str , int | str ]: """Set the brightness and color temperature of a room light. (mock API). Args: brightness: Light level from 0 to 100. Zero is off and 100 is full brightness color_temp: Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. Returns: A dictionary containing the set brightness and color temperature. """ return { "brightness" : brightness , "colorTemperature" : color_temp } JavaScript import { Type } from '@google/genai' ; // Define a function that the model can call to control smart lights const setLightValuesFunctionDeclaration = { name : 'set_light_values' , description : 'Sets the brightness and color temperature of a light.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'Light level from 0 to 100. Zero is off and 100 is full brightness' , }, color_temp : { type : Type . STRING , enum : [ 'daylight' , 'cool' , 'warm' ], description : 'Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`.' , }, }, required : [ 'brightness' , 'color_temp' ], }, }; /** * Set the brightness and color temperature of a room light. (mock API) * @param {number} brightness - Light level from 0 to 100. Zero is off and 100 is full brightness * @param {string} color_temp - Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. * @return {Object} A dictionary containing the set brightness and color temperature. */ function setLightValues ( brightness , color_temp ) { return { brightness : brightness , colorTemperature : color_temp }; } Step 2: Call the model with function declarations Once you have defined your function declarations, you can prompt the model to use them. It analyzes the prompt and function declarations and decides whether to respond directly or to call a function. If a function is called, the response object will contain a function call suggestion. Python from google.genai import types # Configure the client and tools client = genai . Client () tools = types . Tool ( function_declarations = [ set_light_values_declaration ]) config = types . GenerateContentConfig ( tools = [ tools ]) # Define user prompt contents = [ types . Content ( role = "user" , parts = [ types . Part ( text = "Turn the lights down to a romantic level" )] ) ] # Send request with function declarations response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = contents config = config , ) print ( response . candidates [ 0 ] . content . parts [ 0 ] . function_call ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Generation config with function declaration const config = { tools : [{ functionDeclarations : [ setLightValuesFunctionDeclaration ] }] }; // Configure the client const ai = new GoogleGenAI ({}); // Define user prompt const contents = [ { role : 'user' , parts : [{ text : 'Turn the lights down to a romantic level' }] } ]; // Send request with function declarations const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( response . functionCalls [ 0 ]); The model then returns a functionCall object in an OpenAPI compatible schema specifying how to call one or more of the declared functions in order to respond to the user's question. Python id = None args = { 'color_temp' : 'warm' , 'brightness' : 25 } name = 'set_light_values' JavaScript { name : 'set_light_values' , args : { brightness : 25 , color_temp : 'warm' } } Step 3: Execute set_light_values function code Extract the function call details from the model's response, parse the arguments , and execute the set_light_values function. Python # Extract tool call details, it may not be in the first part. tool_call = response . candidates [ 0 ] . content . parts [ 0 ] . function_call if tool_call . name == "set_light_values" : result = set_light_values ( ** tool_call . args ) print ( f "Function execution result: { result } " ) JavaScript // Extract tool call details const tool_call = response . functionCalls [ 0 ] let result ; if ( tool_call . name === 'set_light_values' ) { result = setLightValues ( tool_call . args . brightness , tool_call . args . color_temp ); console . log ( `Function execution result: ${ JSON . stringify ( result ) } ` ); } Step 4: Create user friendly response with function result and call the model again Finally, send the result of the function execution back to the model so it can incorporate this information into its final response to the user. Python # Create a function response part function_response_part = types . Part . from_function_response ( name = tool_call . name , response = { "result" : result }, ) # Append function call and result of the function execution to contents contents . append ( response . candidates [ 0 ] . content ) # Append the content from the model's response. contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Create a function response part const function_response_part = { name : tool_call . name , response : { result } } // Append function call and result of the function execution to contents contents . push ( response . candidates [ 0 ]. content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); // Get the final response from the model const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); This completes the function calling flow. The model successfully used the set_light_values function to perform the request action of the user. Function declarations When you implement function calling in a prompt, you create a tools object, which contains one or more function declarations . You define functions using JSON, specifically with a select subset of the OpenAPI schema format. A single function declaration can include the following parameters: name (string): A unique name for the function ( get_weather_forecast , send_email ). Use descriptive names without spaces or special characters (use underscores or camelCase). description (string): A clear and detailed explanation of the function's purpose and capabilities. This is crucial for the model to understand when to use the function. Be specific and provide examples if helpful ("Finds theaters based on location and optionally movie title which is currently playing in theaters."). parameters (object): Defines the input parameters the function expects. type (string): Specifies the overall data type, such as object . properties (object): Lists individual parameters, each with: type (string): The data type of the parameter, such as string , integer , boolean, array . description (string): A description of the parameter's purpose and format. Provide examples and constraints ("The city and state, e.g., 'San Francisco, CA' or a zip code e.g., '95616'."). enum (array, optional): If the parameter values are from a fixed set, use "enum" to list the allowed values instead of just describing them in the description. This improves accuracy ("enum": ["daylight", "cool", "warm"]). required (array): An array of strings listing the parameter names that are mandatory for the function to operate. Function calling with thinking Enabling "thinking" can improve function call performance by allowing the model to reason through a request before suggesting function calls. However, because the Gemini API is stateless, this reasoning context is lost between turns, which can reduce the quality of function calls as they require multiple turn requests. To preserve this context you can use thought signatures. A thought signature is an encrypted representation of the model's internal thought process that you pass back to the model on subsequent turns. To use thought signatures: Receive the signature: When thinking is enabled, the API response will include a thought_signature field containing an encrypted representation of the model's reasoning. Return the signature: When you send the function's execution result back to the server, include the thought_signature you received. This allows the model to restore its previous thinking context and will likely result in better function calling performance. Receiving signatures from the server Signatures are returned in the part after the model's thinking phase, which typically is a text or function call. Here are some examples of what thought signatures look like returned in each type of part, in response to the request "What's the weather in Lake Tahoe?" using the Get Weather example: Text part [{ "candidates" : [ { "content" : { "parts" : [ { "text" : "Here's what the weather in Lake Tahoe is today" , "thoughtSignature" : "ClcBVKhc7ru7KzUI7SrdUoIdAYLm/+i93aHjfIt4xHyAoO/G70tApxnK2ujBhOhC1PrRy1pkQa88fqFvpHNVd1HDjNLO7mkp6/hFwE+SPPEB3fh0hs4oM8MKhgIBVKhc7uIGvrS7i/T4HpfbnYrluFfWNjZ62gewqe4cVdR/Dlh+zbjtYmDD0gPZ+SuBO7vvHQdzsjePRP+2Y5XddX6LEf/cGGgakq8EhVvw/a6IVzUO6XmpHg2Ag1sl8E9+VFH/lC0R0ZuYdFWligtDuYwp5p5q3o59G0TtWeU2MC1y2MJfE9u/KWd313ldka80/X2W/xF2O/4djMp5G2WKcULfve75zeRCy0mc5iS3SB9mTH0cT6x0vtKjeBx50gcg+CQWtJcRuwTVzz54dmvmK9xvnqA8gKGw3DuaM9wfy5hyY7Qg0z3iyyWdP8T/lbjKim8IEQOk7O1vVwP1Ko7oMYH8JgA1CsoBAVSoXO6v4c5RSyd1cn6EIU0pEFQsjW7rYWPuZdOFq/tsGJT9BCfW7KGkPGwlNSq8jTJFvbcJ/DjtndISQYXwiXd2kGa5JfdS2Kh4zOxCxiWtOk+2nCc3+XQk2nonhO+esGJpkDdbbHZSqRgcUtYKq7q28iPFOQvOFyCiZNB7K86Z/6Hnagu2snSlN/BcTMaFGaWpcCClSUo4foRZn3WbNCoM8rcpD7qEJMp4a5baaSxyyeL1ZTGd2HLpFys/oiW6e3oAnhxuIysCwg==" } ] , "role" : "model" } , "index" : 0 } ] , # Remainder of response... Function call part [{ "candidates" : [ { "content" : { "parts" : [ { "functionCall" : { "name" : "getWeather" , "args" : { "city" : "Lake Tahoe" } } , "thoughtSignature" : "CiwBVKhc7nRyTi3HmggPD9iQiRc261f5jwuMdw3H/itDH0emsb9ZVo3Nwx9p6wpsAVSoXO5i8fDV4jBSBLoaWxB5zUdlGY6aIGp+I0oEnwRRSRQ1LOvrDlojEH8JE8HjiKXALdJrvNPiG+HY3GZEO8pZjEZtc3UoBUh7+SVyjK7Xolu7aRYYeUyzrCapoETWypER1jbrJXnFV23hCosBAVSoXO6oIPNJSmbuEDfGafOhuCSHkpr1yjTp35RXYqmCESzRzWf5+nFXLqncqeFo4ohoxbiYQVpVQbOZF81p8o9zg6xeRE7qMeOv+XN7enXGJ4/s3qNFQpfkSMqRdBITN1VpX7jyfEAjvxBNc7PDfDJZmEPY338ZIY5nFFcmzJSWjVrboFt2sMFv+A==" } ] , "role" : "model" } , "finishReason" : "STOP" , "index" : 0 } ] , # Remainder of response... You can confirm that you received a signature and see what a signature looks like using the following code: # Step 2: Call the model with function declarations # ...Generation config, Configure the client, and Define user prompt (No changes) # Send request with declarations (using a thinking model) response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents ) # See thought signatures for part in response . candidates [ 0 ] . content . parts : if part . thought_signature : print ( "Thought signature:" ) print ( part . thought_signature ) Returning signatures back to the server In order to return signatures back: You should return signatures along with their containing parts back to the server You shouldn't merge a part with a signature with another part which also contains a signature. The signature string is not concatenable You shouldn't merge one part with a signature with another part without a signature. This breaks the correct positioning of the thought represented by the signature. The code will remain the same as in Step 4 of the previous section. But in this case (as indicated in the comment below) you will return signatures to the model along with the result of the function execution so the model can incorporate the thoughts into its final response: Python # Step 4: Create user friendly response with function result and call the model again # ...Create a function response part (No change) # Append thought signatures, function call and result of the function execution to contents function_call_content = response . candidates [ 0 ] . content # Append the model's function call message, which includes thought signatures contents . append ( function_call_content ) contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Step 4: Create user friendly response with function result and call the model again // ...Create a function response part (No change) // Append thought signatures, function call and result of the function execution to contents const function_response_content = response . candidates [ 0 ]. content ; contents . push ( function_response_content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); The following shows what a request returning a thought signature may look like: [{ "contents" : [ { "role" : "user" , "parts" : [ { "text" : "what is the weather in Lake Tahoe?" } ] } , { "parts" : [ { "functionCall" : { "name" : "getWeather" , "args" : { "city" : "Lake Tahoe" } } , "thoughtSignature" : "CiIBVKhc7oDPpCaXyJKKssjqr4g3JNOSgJ/M2V+1THC1icsWCmwBVKhc7pBABbZ+zR3e9234WnWWS6GFXmf8IVwpnzjd5KYd7vyJbn/4vTorWBGayj/vbd9JPaZQjxdAIXhoE5mX/MDsQ7M9N/b0qJjHm39tYIBvS4sIWkMDHqTJqXGLzhhKtrTkfbV3RbaJEkQKmwEBVKhc7qVUgC3hfTXZLo9R3AJzUUIx50NKvJTb9B+UU+LBqgg7Nck1x5OpjWVS2R+SsveprIuYOruk2Y0H53J2OJF8qsxTdIq2si8DGW2V7WK8xyoJH5kbqd7drIw1jLb44b6lx4SMyB0VaULuTBki4d+Ljjg1tJTwR0IYMKqDLDZt9mheINsi0ZxcNjfpnDydRXdWbcSwzmK/wgqJAQFUqFzuKgNVElxs3cbO+xebr2IwcOro84nKTisi0tTp9bICPC9fTUhn3L+rvQWA+d3J1Za8at2bakrqiRj7BTh+CVO9fWQMAEQAs3ni0Z2hfaYG92tOD26E4IoZwyYEoWbfNudpH1fr5tEkyqnEGtWIh7H+XoZQ2DXeiOa+br7Zk88SrNE+trJMCogBAVSoXO5e9fBLg7hnbkmKsrzNLnQtLsQm1gNzjcjEC7nJYklYPp0KI2uGBE1PkM8XNsfllAfHVn7LzHcHNlbQ9pJ7QZTSIeG42goS971r5wNZwxaXwCTphClQh826eqJWo6A/28TtAVQWLhTx5ekbP7qb4nh1UblESZ1saxDQAEo4OKPbDzx5BgqKAQFUqFzuVyjNm5i0wN8hTDnKjfpDroEpPPTs531iFy9BOX+xDCdGHy8D+osFpaoBq6TFekQQbz4hIoUR1YEcP4zI80/cNimEeb9IcFxZTTxiNrbhbbcv0969DSMWhB+ZEqIz4vuw4GLe/xcUvqhlChQwFdgIbdOQHSHpatn5uDlktnP/bi26nKuXIwo0AVSoXO7US22OUH7d1f4abNPI0IyAvhqkPp12rbtWLx9vkOtojE8IP+xCfYtIFuZIzRNZqA==" } ] , "role" : "model" } , { "role" : "user" , "parts" : [ { "functionResponse" : { "name" : "getWeather" , "response" : { "response" : { "stringValue" : "Sunny and hot. 90 degrees Fahrenheit" } } } } ] } ] , # Remainder of request... Learn more about limitations and usage of thought signatures, and about thinking models in general, on the Thinking page. Parallel function calling In addition to single turn function calling, you can also call multiple functions at once. Parallel function calling lets you execute multiple functions at once and is used when the functions are not dependent on each other. This is useful in scenarios like gathering data from multiple independent sources, such as retrieving customer details from different databases or checking inventory levels across various warehouses or performing multiple actions such as converting your apartment into a disco. Python power_disco_ball = { "name" : "power_disco_ball" , "description" : "Powers the spinning disco ball." , "parameters" : { "type" : "object" , "properties" : { "power" : { "type" : "boolean" , "description" : "Whether to turn the disco ball on or off." , } }, "required" : [ "power" ], }, } start_music = { "name" : "start_music" , "description" : "Play some music matching the specified parameters." , "parameters" : { "type" : "object" , "properties" : { "energetic" : { "type" : "boolean" , "description" : "Whether the music is energetic or not." , }, "loud" : { "type" : "boolean" , "description" : "Whether the music is loud or not." , }, }, "required" : [ "energetic" , "loud" ], }, } dim_lights = { "name" : "dim_lights" , "description" : "Dim the lights." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "number" , "description" : "The brightness of the lights, 0.0 is off, 1.0 is full." , } }, "required" : [ "brightness" ], }, } JavaScript import { Type } from '@google/genai' ; const powerDiscoBall = { name : 'power_disco_ball' , description : 'Powers the spinning disco ball.' , parameters : { type : Type . OBJECT , properties : { power : { type : Type . BOOLEAN , description : 'Whether to turn the disco ball on or off.' } }, required : [ 'power' ] } }; const startMusic = { name : 'start_music' , description : 'Play some music matching the specified parameters.' , parameters : { type : Type . OBJECT , properties : { energetic : { type : Type . BOOLEAN , description : 'Whether the music is energetic or not.' }, loud : { type : Type . BOOLEAN , description : 'Whether the music is loud or not.' } }, required : [ 'energetic' , 'loud' ] } }; const dimLights = { name : 'dim_lights' , description : 'Dim the lights.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'The brightness of the lights, 0.0 is off, 1.0 is full.' } }, required : [ 'brightness' ] } }; Configure the function calling mode to allow using all of the specified tools. To learn more, you can read about configuring function calling . Python from google import genai from google.genai import types # Configure the client and tools client = genai . Client () house_tools = [ types . Tool ( function_declarations = [ power_disco_ball , start_music , dim_lights ]) ] config = types . GenerateContentConfig ( tools = house_tools , automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ), # Force the model to call 'any' function, instead of chatting. tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = 'ANY' ) ), ) chat = client . chats . create ( model = "gemini-2.5-flash" , config = config ) response = chat . send_message ( "Turn this place into a party!" ) # Print out each of the function calls requested from this single call print ( "Example 1: Forced function calling" ) for fn in response . function_calls : args = ", " . join ( f " { key } = { val } " for key , val in fn . args . items ()) print ( f " { fn . name } ( { args } )" ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Set up function declarations const houseFns = [ powerDiscoBall , startMusic , dimLights ]; const config = { tools : [{ functionDeclarations : houseFns }], // Force the model to call 'any' function, instead of chatting. toolConfig : { functionCallingConfig : { mode : 'any' } } }; // Configure the client const ai = new GoogleGenAI ({}); // Create a chat session const chat = ai . chats . create ({ model : 'gemini-2.5-flash' , config : config }); const response = await chat . sendMessage ({ message : 'Turn this place into a party!' }); // Print out each of the function calls requested from this single call console . log ( "Example 1: Forced function calling" ); for ( const fn of response . functionCalls ) { const args = Object . entries ( fn . args ) . map (([ key , val ]) = > ` ${ key } = ${ val } ` ) . join ( ', ' ); console . log ( ` ${ fn . name } ( ${ args } )` ); } Each of the printed results reflects a single function call that the model has requested. To send the results back, include the responses in the same order as they were requested. The Python SDK supports automatic function calling , which automatically converts Python functions to declarations, handles the function call execution and response cycle for you. Following is an example for the disco use case. Note: Automatic Function Calling is a Python SDK only feature at the moment. Python from google import genai from google.genai import types # Actual function implementations def power_disco_ball_impl ( power : bool ) - > dict : """Powers the spinning disco ball. Args: power: Whether to turn the disco ball on or off. Returns: A status dictionary indicating the current state. """ return { "status" : f "Disco ball powered { 'on' if power else 'off' } " } def start_music_impl ( energetic : bool , loud : bool ) - > dict : """Play some music matching the specified parameters. Args: energetic: Whether the music is energetic or not. loud: Whether the music is loud or not. Returns: A dictionary containing the music settings. """ music_type = "energetic" if energetic else "chill" volume = "loud" if loud else "quiet" return { "music_type" : music_type , "volume" : volume } def dim_lights_impl ( brightness : float ) - > dict : """Dim the lights. Args: brightness: The brightness of the lights, 0.0 is off, 1.0 is full. Returns: A dictionary containing the new brightness setting. """ return { "brightness" : brightness } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ power_disco_ball_impl , start_music_impl , dim_lights_impl ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Do everything you need to this place into party!" , config = config , ) print ( " \n Example 2: Automatic function calling" ) print ( response . text ) # I've turned on the disco ball, started playing loud and energetic music, and dimmed the lights to 50% brightness. Let's get this party started! Compositional function calling Compositional or sequential function calling allows Gemini to chain multiple function calls together to fulfill a complex request. For example, to answer "Get the temperature in my current location", the Gemini API might first invoke a get_current_location() function followed by a get_weather() function that takes the location as a parameter. The following example demonstrates how to implement compositional function calling using the Python SDK and automatic function calling. Python This example uses the automatic function calling feature of the google-genai Python SDK. The SDK automatically converts the Python functions to the required schema, executes the function calls when requested by the model, and sends the results back to the model to complete the task. import os from google import genai from google.genai import types # Example Functions def get_weather_forecast ( location : str ) - > dict : """Gets the current weather temperature for a given location.""" print ( f "Tool Call: get_weather_forecast(location= { location } )" ) # TODO: Make API call print ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ) return { "temperature" : 25 , "unit" : "celsius" } # Dummy response def set_thermostat_temperature ( temperature : int ) - > dict : """Sets the thermostat to a desired temperature.""" print ( f "Tool Call: set_thermostat_temperature(temperature= { temperature } )" ) # TODO: Interact with a thermostat API print ( "Tool Response: {'status': 'success'}" ) return { "status" : "success" } # Configure the client and model client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_weather_forecast , set_thermostat_temperature ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , config = config , ) # Print the final, user-facing response print ( response . text ) Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . I 've set the thermostat to 20°C. JavaScript This example shows how to use JavaScript/TypeScript SDK to do comopositional function calling using a manual execution loop. import { GoogleGenAI , Type } from "@google/genai" ; // Configure the client const ai = new GoogleGenAI ({}); // Example Functions function get_weather_forecast ({ location }) { console . log ( `Tool Call: get_weather_forecast(location= ${ location } )` ); // TODO: Make API call console . log ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ); return { temperature : 25 , unit : "celsius" }; } function set_thermostat_temperature ({ temperature }) { console . log ( `Tool Call: set_thermostat_temperature(temperature= ${ temperature } )` , ); // TODO: Make API call console . log ( "Tool Response: {'status': 'success'}" ); return { status : "success" }; } const toolFunctions = { get_weather_forecast , set_thermostat_temperature , }; const tools = [ { functionDeclarations : [ { name : "get_weather_forecast" , description : "Gets the current weather temperature for a given location." , parameters : { type : Type . OBJECT , properties : { location : { type : Type . STRING , }, }, required : [ "location" ], }, }, { name : "set_thermostat_temperature" , description : "Sets the thermostat to a desired temperature." , parameters : { type : Type . OBJECT , properties : { temperature : { type : Type . NUMBER , }, }, required : [ "temperature" ], }, }, ], }, ]; // Prompt for the model let contents = [ { role : "user" , parts : [ { text : "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , }, ], }, ]; // Loop until the model has no more function calls to make while ( true ) { const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents , config : { tools }, }); if ( result . functionCalls && result . functionCalls . length > 0 ) { const functionCall = result . functionCalls [ 0 ]; const { name , args } = functionCall ; if ( ! toolFunctions [ name ]) { throw new Error ( `Unknown function call: ${ name } ` ); } // Call the function and get the response. const toolResponse = toolFunctions [ name ]( args ); const functionResponsePart = { name : functionCall . name , response : { result : toolResponse , }, }; // Send the function response back to the model. contents . push ({ role : "model" , parts : [ { functionCall : functionCall , }, ], }); contents . push ({ role : "user" , parts : [ { functionResponse : functionResponsePart , }, ], }); } else { // No more function calls, break the loop. console . log ( result . text ); break ; } } Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . It 's 25°C in London, so I' ve set the thermostat to 20 ° C . Compositional function calling is a native Live API feature. This means Live API can handle the function calling similar to the Python SDK. Python # Light control schemas turn_on_the_lights_schema = { 'name' : 'turn_on_the_lights' } turn_off_the_lights_schema = { 'name' : 'turn_off_the_lights' } prompt = """ Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? """ tools = [ { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} ] await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Light control schemas const turnOnTheLightsSchema = { name : 'turn_on_the_lights' }; const turnOffTheLightsSchema = { name : 'turn_off_the_lights' }; const prompt = ` Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? ` ; const tools = [ { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } ]; await run ( prompt , tools = tools , modality = "AUDIO" ) Function calling modes The Gemini API lets you control how the model uses the provided tools (function declarations). Specifically, you can set the mode within the. function_calling_config . AUTO (Default) : The model decides whether to generate a natural language response or suggest a function call based on the prompt and context. This is the most flexible mode and recommended for most scenarios. ANY : The model is constrained to always predict a function call and guarantees function schema adherence. If allowed_function_names is not specified, the model can choose from any of the provided function declarations. If allowed_function_names is provided as a list, the model can only choose from the functions in that list. Use this mode when you require a function call response to every prompt (if applicable). NONE : The model is prohibited from making function calls. This is equivalent to sending a request without any function declarations. Use this to temporarily disable function calling without removing your tool definitions. Python from google.genai import types # Configure function calling mode tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = "ANY" , allowed_function_names = [ "get_current_temperature" ] ) ) # Create the generation config config = types . GenerateContentConfig ( tools = [ tools ], # not defined here. tool_config = tool_config , ) JavaScript import { FunctionCallingConfigMode } from '@google/genai' ; // Configure function calling mode const toolConfig = { functionCallingConfig : { mode : FunctionCallingConfigMode . ANY , allowedFunctionNames : [ 'get_current_temperature' ] } }; // Create the generation config const config = { tools : tools , // not defined here. toolConfig : toolConfig , }; Automatic function calling (Python only) When using the Python SDK, you can provide Python functions directly as tools. The SDK automatically converts the Python function to declarations, handles the function call execution and the response cycle for you. The Python SDK then automatically: Detects function call responses from the model. Call the corresponding Python function in your code. Sends the function response back to the model. Returns the model's final text response. To use this, define your function with type hints and a docstring, and then pass the function itself (not a JSON declaration) as a tool: Python from google import genai from google.genai import types # Define the function with type hints and docstring def get_current_temperature ( location : str ) - > dict : """Gets the current temperature for a given location. Args: location: The city and state, e.g. San Francisco, CA Returns: A dictionary containing the temperature and unit. """ # ... (implementation) ... return { "temperature" : 25 , "unit" : "Celsius" } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_current_temperature ] ) # Pass the function itself # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "What's the temperature in Boston?" , config = config , ) print ( response . text ) # The SDK handles the function call and returns the final text You can disable automatic function calling with: Python config = types . GenerateContentConfig ( tools = [ get_current_temperature ], automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ) ) Automatic function schema declaration Automatic schema extraction from Python functions doesn't work in all cases. For example, it doesn't handle cases where you describe the fields of a nested dictionary-object. The API is able to describe any of the following types: Python AllowedType = ( int | float | bool | str | list [ 'AllowedType' ] | dict [ str , AllowedType ]) To see what the inferred schema looks like, you can convert it using from_callable : Python def multiply ( a : float , b : float ): """Returns a * b.""" return a * b fn_decl = types . FunctionDeclaration . from_callable ( callable = multiply , client = client ) # to_json_dict() provides a clean JSON representation. print ( fn_decl . to_json_dict ()) Multi-tool use: Combine native tools with function calling You can enable multiple tools combining native tools with function calling at the same time. Here's an example that enables two tools, Grounding with Google Search and code execution , in a request using the Live API . Note: Multi-tool use is a- Live API only feature at the moment. The run() function declaration, which handles the asynchronous websocket setup, is omitted for brevity. Python # Multiple tasks example - combining lights, code execution, and search prompt = """ Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! """ tools = [ { 'google_search' : {}}, { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} # not defined here. ] # Execute the prompt with specified tools in audio modality await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Multiple tasks example - combining lights, code execution, and search const prompt = ` Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! ` ; const tools = [ { googleSearch : {} }, { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } // not defined here. ]; // Execute the prompt with specified tools in audio modality await run ( prompt , { tools : tools , modality : "AUDIO" }); Python developers can try this out in the Live API Tool Use notebook . Model context protocol (MCP) Model Context Protocol (MCP) is an open standard for connecting AI applications with external tools and data. MCP provides a common protocol for models to access context, such as functions (tools), data sources (resources), or predefined prompts. The Gemini SDKs have built-in support for the MCP, reducing boilerplate code and offering automatic tool calling for MCP tools. When the model generates an MCP tool call, the Python and JavaScript client SDK can automatically execute the MCP tool and send the response back to the model in a subsequent request, continuing this loop until no more tool calls are made by the model. Here, you can find an example of how to use a local MCP server with Gemini and mcp SDK. Python Make sure the latest version of the mcp SDK is installed on your platform of choice. pip install mcp Note: Python supports automatic tool calling by passing in the ClientSession into the tools parameters. If you want to disable it, you can provide automatic_function_calling with disabled True . import os import asyncio from datetime import datetime from mcp import ClientSession , StdioServerParameters from mcp.client.stdio import stdio_client from google import genai client = genai . Client () # Create server parameters for stdio connection server_params = StdioServerParameters ( command = "npx" , # Executable args = [ "-y" , "@philschmid/weather-mcp" ], # MCP Server env = None , # Optional environment variables ) async def run (): async with stdio_client ( server_params ) as ( read , write ): async with ClientSession ( read , write ) as session : # Prompt to get the weather for the current day in London. prompt = f "What is the weather in London in { datetime . now () . strftime ( '%Y-%m- %d ' ) } ?" # Initialize the connection between client and server await session . initialize () # Send request to the model with MCP function declarations response = await client . aio . models . generate_content ( model = "gemini-2.5-flash" , contents = prompt , config = genai . types . GenerateContentConfig ( temperature = 0 , tools = [ session ], # uses the session, will automatically call the tool # Uncomment if you **don't** want the SDK to automatically call the tool # automatic_function_calling=genai.types.AutomaticFunctionCallingConfig( # disable=True # ), ), ) print ( response . text ) # Start the asyncio event loop and run the main function asyncio . run ( run ()) JavaScript Make sure the latest version of the mcp SDK is installed on your platform of choice. npm install @modelcontextprotocol/sdk Note: JavaScript supports automatic tool calling by wrapping the client with mcpToTool . If you want to disable it, you can provide automaticFunctionCalling with disabled true . import { GoogleGenAI , FunctionCallingConfigMode , mcpToTool } from '@google/genai' ; import { Client } from "@modelcontextprotocol/sdk/client/index.js" ; import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js" ; // Create server parameters for stdio connection const serverParams = new StdioClientTransport ({ command : "npx" , // Executable args : [ "-y" , "@philschmid/weather-mcp" ] // MCP Server }); const client = new Client ( { name : "example-client" , version : "1.0.0" } ); // Configure the client const ai = new GoogleGenAI ({}); // Initialize the connection between client and server await client . connect ( serverParams ); // Send request to the model with MCP tools const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : `What is the weather in London in ${ new Date (). toLocaleDateString () } ?` , config : { tools : [ mcpToTool ( client )], // uses the session, will automatically call the tool // Uncomment if you **don't** want the sdk to automatically call the tool // automaticFunctionCalling: { // disable: true, // }, }, }); console . log ( response . text ) // Close the connection await client . close (); Limitations with built-in MCP support Built-in MCP support is a experimental feature in our SDKs and has the following limitations: Only tools are supported, not resources nor prompts It is available for the Python and JavaScript/TypeScript SDK. Breaking changes might occur in future releases. Manual integration of MCP servers is always an option if these limit what you're building. Supported models This section lists models and their function calling capabilities. Experimental models are not included. You can find a comprehensive capabilities overview on the model overview page. Model Function Calling Parallel Function Calling Compositional Function Calling Gemini 2.5 Pro ✔️ ✔️ ✔️ Gemini 2.5 Flash ✔️ ✔️ ✔️ Gemini 2.5 Flash-Lite ✔️ ✔️ ✔️ Gemini 2.0 Flash ✔️ ✔️ ✔️ Gemini 2.0 Flash-Lite X X X Best practices Function and Parameter Descriptions: Be extremely clear and specific in your descriptions. The model relies on these to choose the correct function and provide appropriate arguments. Naming: Use descriptive function names (without spaces, periods, or dashes). Strong Typing: Use specific types (integer, string, enum) for parameters to reduce errors. If a parameter has a limited set of valid values, use an enum. Tool Selection: While the model can use an arbitrary number of tools, providing too many can increase the risk of selecting an incorrect or suboptimal tool. For best results, aim to provide only the relevant tools for the context or task, ideally keeping the active set to a maximum of 10-20. Consider dynamic tool selection based on conversation context if you have a large total number of tools. Prompt Engineering: Provide context: Tell the model its role (e.g., "You are a helpful weather assistant."). Give instructions: Specify how and when to use functions (e.g., "Don't guess dates; always use a future date for forecasts."). Encourage clarification: Instruct the model to ask clarifying questions if needed. Temperature: Use a low temperature (e.g., 0) for more deterministic and reliable function calls. Validation: If a function call has significant consequences (e.g., placing an order), validate the call with the user before executing it. Error Handling : Implement robust error handling in your functions to gracefully handle unexpected inputs or API failures. Return informative error messages that the model can use to generate helpful responses to the user. Security: Be mindful of security when calling external APIs. Use appropriate authentication and authorization mechanisms. Avoid exposing sensitive data in function calls. Token Limits: Function descriptions and parameters count towards your input token limit. If you're hitting token limits, consider limiting the number of functions or the length of the descriptions, break down complex tasks into smaller, more focused function sets. Notes and limitations Only a subset of the OpenAPI schema is supported. Supported parameter types in Python are limited. Automatic function calling is a Python SDK feature only. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-10 UTC.
 
2
  Title: Function calling with the Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Function calling with the Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংল��� ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Function calling with the Gemini API Function calling lets you connect models to external tools and APIs. Instead of generating text responses, the model determines when to call specific functions and provides the necessary parameters to execute real-world actions. This allows the model to act as a bridge between natural language and real-world actions and data. Function calling has 3 primary use cases: Augment Knowledge: Access information from external sources like databases, APIs, and knowledge bases. Extend Capabilities: Use external tools to perform computations and extend the limitations of the model, such as using a calculator or creating charts. Take Actions: Interact with external systems using APIs, such as scheduling appointments, creating invoices, sending emails, or controlling smart home devices. Get Weather Schedule Meeting Create Chart How function calling works Function calling involves a structured interaction between your application, the model, and external functions. Here's a breakdown of the process: Define Function Declaration: Define the function declaration in your application code. Function Declarations describe the function's name, parameters, and purpose to the model. Call LLM with function declarations: Send user prompt along with the function declaration(s) to the model. It analyzes the request and determines if a function call would be helpful. If so, it responds with a structured JSON object. Execute Function Code (Your Responsibility): The Model does not execute the function itself. It's your application's responsibility to process the response and check for Function Call, if Yes : Extract the name and args of the function and execute the corresponding function in your application. No: The model has provided a direct text response to the prompt (this flow is less emphasized in the example but is a possible outcome). Create User friendly response: If a function was executed, capture the result and send it back to the model in a subsequent turn of the conversation. It will use the result to generate a final, user-friendly response that incorporates the information from the function call. This process can be repeated over multiple turns, allowing for complex interactions and workflows. The model also supports calling multiple functions in a single turn ( parallel function calling ) and in sequence ( compositional function calling ). Step 1: Define a function declaration Define a function and its declaration within your application code that allows users to set light values and make an API request. This function could call external services or APIs. Python # Define a function that the model can call to control smart lights set_light_values_declaration = { "name" : "set_light_values" , "description" : "Sets the brightness and color temperature of a light." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "integer" , "description" : "Light level from 0 to 100. Zero is off and 100 is full brightness" , }, "color_temp" : { "type" : "string" , "enum" : [ "daylight" , "cool" , "warm" ], "description" : "Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`." , }, }, "required" : [ "brightness" , "color_temp" ], }, } # This is the actual function that would be called based on the model's suggestion def set_light_values ( brightness : int , color_temp : str ) - > dict [ str , int | str ]: """Set the brightness and color temperature of a room light. (mock API). Args: brightness: Light level from 0 to 100. Zero is off and 100 is full brightness color_temp: Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. Returns: A dictionary containing the set brightness and color temperature. """ return { "brightness" : brightness , "colorTemperature" : color_temp } JavaScript import { Type } from '@google/genai' ; // Define a function that the model can call to control smart lights const setLightValuesFunctionDeclaration = { name : 'set_light_values' , description : 'Sets the brightness and color temperature of a light.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'Light level from 0 to 100. Zero is off and 100 is full brightness' , }, color_temp : { type : Type . STRING , enum : [ 'daylight' , 'cool' , 'warm' ], description : 'Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`.' , }, }, required : [ 'brightness' , 'color_temp' ], }, }; /** * Set the brightness and color temperature of a room light. (mock API) * @param {number} brightness - Light level from 0 to 100. Zero is off and 100 is full brightness * @param {string} color_temp - Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. * @return {Object} A dictionary containing the set brightness and color temperature. */ function setLightValues ( brightness , color_temp ) { return { brightness : brightness , colorTemperature : color_temp }; } Step 2: Call the model with function declarations Once you have defined your function declarations, you can prompt the model to use them. It analyzes the prompt and function declarations and decides whether to respond directly or to call a function. If a function is called, the response object will contain a function call suggestion. Python from google.genai import types # Configure the client and tools client = genai . Client () tools = types . Tool ( function_declarations = [ set_light_values_declaration ]) config = types . GenerateContentConfig ( tools = [ tools ]) # Define user prompt contents = [ types . Content ( role = "user" , parts = [ types . Part ( text = "Turn the lights down to a romantic level" )] ) ] # Send request with function declarations response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = contents config = config , ) print ( response . candidates [ 0 ] . content . parts [ 0 ] . function_call ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Generation config with function declaration const config = { tools : [{ functionDeclarations : [ setLightValuesFunctionDeclaration ] }] }; // Configure the client const ai = new GoogleGenAI ({}); // Define user prompt const contents = [ { role : 'user' , parts : [{ text : 'Turn the lights down to a romantic level' }] } ]; // Send request with function declarations const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( response . functionCalls [ 0 ]); The model then returns a functionCall object in an OpenAPI compatible schema specifying how to call one or more of the declared functions in order to respond to the user's question. Python id = None args = { 'color_temp' : 'warm' , 'brightness' : 25 } name = 'set_light_values' JavaScript { name : 'set_light_values' , args : { brightness : 25 , color_temp : 'warm' } } Step 3: Execute set_light_values function code Extract the function call details from the model's response, parse the arguments , and execute the set_light_values function. Python # Extract tool call details, it may not be in the first part. tool_call = response . candidates [ 0 ] . content . parts [ 0 ] . function_call if tool_call . name == "set_light_values" : result = set_light_values ( ** tool_call . args ) print ( f "Function execution result: { result } " ) JavaScript // Extract tool call details const tool_call = response . functionCalls [ 0 ] let result ; if ( tool_call . name === 'set_light_values' ) { result = setLightValues ( tool_call . args . brightness , tool_call . args . color_temp ); console . log ( `Function execution result: ${ JSON . stringify ( result ) } ` ); } Step 4: Create user friendly response with function result and call the model again Finally, send the result of the function execution back to the model so it can incorporate this information into its final response to the user. Python # Create a function response part function_response_part = types . Part . from_function_response ( name = tool_call . name , response = { "result" : result }, ) # Append function call and result of the function execution to contents contents . append ( response . candidates [ 0 ] . content ) # Append the content from the model's response. contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Create a function response part const function_response_part = { name : tool_call . name , response : { result } } // Append function call and result of the function execution to contents contents . push ( response . candidates [ 0 ]. content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); // Get the final response from the model const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); This completes the function calling flow. The model successfully used the set_light_values function to perform the request action of the user. Function declarations When you implement function calling in a prompt, you create a tools object, which contains one or more function declarations . You define functions using JSON, specifically with a select subset of the OpenAPI schema format. A single function declaration can include the following parameters: name (string): A unique name for the function ( get_weather_forecast , send_email ). Use descriptive names without spaces or special characters (use underscores or camelCase). description (string): A clear and detailed explanation of the function's purpose and capabilities. This is crucial for the model to understand when to use the function. Be specific and provide examples if helpful ("Finds theaters based on location and optionally movie title which is currently playing in theaters."). parameters (object): Defines the input parameters the function expects. type (string): Specifies the overall data type, such as object . properties (object): Lists individual parameters, each with: type (string): The data type of the parameter, such as string , integer , boolean, array . description (string): A description of the parameter's purpose and format. Provide examples and constraints ("The city and state, e.g., 'San Francisco, CA' or a zip code e.g., '95616'."). enum (array, optional): If the parameter values are from a fixed set, use "enum" to list the allowed values instead of just describing them in the description. This improves accuracy ("enum": ["daylight", "cool", "warm"]). required (array): An array of strings listing the parameter names that are mandatory for the function to operate. You can also construct FunctionDeclarations from Python functions directly using types.FunctionDeclaration.from_callable(client=client, callable=your_function) . Function calling with thinking Enabling " thinking " can improve function call performance by allowing the model to reason through a request before suggesting function calls. The Gemini API is stateless, the model's reasoning context will be lost between turns in a multi-turn conversation. To preserve this context, you can use thought signatures. A thought signature is an encrypted representation of the model's internal thought process that you pass back to the model on subsequent turns. The standard pattern for multi-turn tool use is to append the model's complete previous response to the conversation history. The content object includes the thought_signatures automatically. If you follow this pattern No code changes are required . Manually managing thought signatures If you modify the conversation history manually—instead of sending the complete previous response and want to benefit from thinking you must correctly handle the thought_signature included in the model's turn. Follow these rules to ensure the model's context is preserved: Always send the thought_signature back to the model inside its original Part . Don't merge a Part containing a signature with one that does not. This breaks the positional context of the thought. Don't combine two Parts that both contain signatures, as the signature strings cannot be merged. Inspecting Thought Signatures While not necessary for implementation, you can inspect the response to see the thought_signature for debugging or educational purposes. Python import base64 # After receiving a response from a model with thinking enabled # response = client.models.generate_content(...) # The signature is attached to the response part containing the function call part = response . candidates [ 0 ] . content . parts [ 0 ] if part . thought_signature : print ( base64 . b64encode ( part . thought_signature ) . decode ( "utf-8" )) JavaScript // After receiving a response from a model with thinking enabled // const response = await ai.models.generateContent(...) // The signature is attached to the response part containing the function call const part = response . candidates [ 0 ]. content . parts [ 0 ]; if ( part . thoughtSignature ) { console . log ( part . thoughtSignature ); } Learn more about limitations and usage of thought signatures, and about thinking models in general, on the Thinking page. Parallel function calling In addition to single turn function calling, you can also call multiple functions at once. Parallel function calling lets you execute multiple functions at once and is used when the functions are not dependent on each other. This is useful in scenarios like gathering data from multiple independent sources, such as retrieving customer details from different databases or checking inventory levels across various warehouses or performing multiple actions such as converting your apartment into a disco. Python power_disco_ball = { "name" : "power_disco_ball" , "description" : "Powers the spinning disco ball." , "parameters" : { "type" : "object" , "properties" : { "power" : { "type" : "boolean" , "description" : "Whether to turn the disco ball on or off." , } }, "required" : [ "power" ], }, } start_music = { "name" : "start_music" , "description" : "Play some music matching the specified parameters." , "parameters" : { "type" : "object" , "properties" : { "energetic" : { "type" : "boolean" , "description" : "Whether the music is energetic or not." , }, "loud" : { "type" : "boolean" , "description" : "Whether the music is loud or not." , }, }, "required" : [ "energetic" , "loud" ], }, } dim_lights = { "name" : "dim_lights" , "description" : "Dim the lights." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "number" , "description" : "The brightness of the lights, 0.0 is off, 1.0 is full." , } }, "required" : [ "brightness" ], }, } JavaScript import { Type } from '@google/genai' ; const powerDiscoBall = { name : 'power_disco_ball' , description : 'Powers the spinning disco ball.' , parameters : { type : Type . OBJECT , properties : { power : { type : Type . BOOLEAN , description : 'Whether to turn the disco ball on or off.' } }, required : [ 'power' ] } }; const startMusic = { name : 'start_music' , description : 'Play some music matching the specified parameters.' , parameters : { type : Type . OBJECT , properties : { energetic : { type : Type . BOOLEAN , description : 'Whether the music is energetic or not.' }, loud : { type : Type . BOOLEAN , description : 'Whether the music is loud or not.' } }, required : [ 'energetic' , 'loud' ] } }; const dimLights = { name : 'dim_lights' , description : 'Dim the lights.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'The brightness of the lights, 0.0 is off, 1.0 is full.' } }, required : [ 'brightness' ] } }; Configure the function calling mode to allow using all of the specified tools. To learn more, you can read about configuring function calling . Python from google import genai from google.genai import types # Configure the client and tools client = genai . Client () house_tools = [ types . Tool ( function_declarations = [ power_disco_ball , start_music , dim_lights ]) ] config = types . GenerateContentConfig ( tools = house_tools , automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ), # Force the model to call 'any' function, instead of chatting. tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = 'ANY' ) ), ) chat = client . chats . create ( model = "gemini-2.5-flash" , config = config ) response = chat . send_message ( "Turn this place into a party!" ) # Print out each of the function calls requested from this single call print ( "Example 1: Forced function calling" ) for fn in response . function_calls : args = ", " . join ( f " { key } = { val } " for key , val in fn . args . items ()) print ( f " { fn . name } ( { args } )" ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Set up function declarations const houseFns = [ powerDiscoBall , startMusic , dimLights ]; const config = { tools : [{ functionDeclarations : houseFns }], // Force the model to call 'any' function, instead of chatting. toolConfig : { functionCallingConfig : { mode : 'any' } } }; // Configure the client const ai = new GoogleGenAI ({}); // Create a chat session const chat = ai . chats . create ({ model : 'gemini-2.5-flash' , config : config }); const response = await chat . sendMessage ({ message : 'Turn this place into a party!' }); // Print out each of the function calls requested from this single call console . log ( "Example 1: Forced function calling" ); for ( const fn of response . functionCalls ) { const args = Object . entries ( fn . args ) . map (([ key , val ]) = > ` ${ key } = ${ val } ` ) . join ( ', ' ); console . log ( ` ${ fn . name } ( ${ args } )` ); } Each of the printed results reflects a single function call that the model has requested. To send the results back, include the responses in the same order as they were requested. The Python SDK supports automatic function calling , which automatically converts Python functions to declarations, handles the function call execution and response cycle for you. Following is an example for the disco use case. Note: Automatic Function Calling is a Python SDK only feature at the moment. Python from google import genai from google.genai import types # Actual function implementations def power_disco_ball_impl ( power : bool ) - > dict : """Powers the spinning disco ball. Args: power: Whether to turn the disco ball on or off. Returns: A status dictionary indicating the current state. """ return { "status" : f "Disco ball powered { 'on' if power else 'off' } " } def start_music_impl ( energetic : bool , loud : bool ) - > dict : """Play some music matching the specified parameters. Args: energetic: Whether the music is energetic or not. loud: Whether the music is loud or not. Returns: A dictionary containing the music settings. """ music_type = "energetic" if energetic else "chill" volume = "loud" if loud else "quiet" return { "music_type" : music_type , "volume" : volume } def dim_lights_impl ( brightness : float ) - > dict : """Dim the lights. Args: brightness: The brightness of the lights, 0.0 is off, 1.0 is full. Returns: A dictionary containing the new brightness setting. """ return { "brightness" : brightness } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ power_disco_ball_impl , start_music_impl , dim_lights_impl ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Do everything you need to this place into party!" , config = config , ) print ( " \n Example 2: Automatic function calling" ) print ( response . text ) # I've turned on the disco ball, started playing loud and energetic music, and dimmed the lights to 50% brightness. Let's get this party started! Compositional function calling Compositional or sequential function calling allows Gemini to chain multiple function calls together to fulfill a complex request. For example, to answer "Get the temperature in my current location", the Gemini API might first invoke a get_current_location() function followed by a get_weather() function that takes the location as a parameter. The following example demonstrates how to implement compositional function calling using the Python SDK and automatic function calling. Python This example uses the automatic function calling feature of the google-genai Python SDK. The SDK automatically converts the Python functions to the required schema, executes the function calls when requested by the model, and sends the results back to the model to complete the task. import os from google import genai from google.genai import types # Example Functions def get_weather_forecast ( location : str ) - > dict : """Gets the current weather temperature for a given location.""" print ( f "Tool Call: get_weather_forecast(location= { location } )" ) # TODO: Make API call print ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ) return { "temperature" : 25 , "unit" : "celsius" } # Dummy response def set_thermostat_temperature ( temperature : int ) - > dict : """Sets the thermostat to a desired temperature.""" print ( f "Tool Call: set_thermostat_temperature(temperature= { temperature } )" ) # TODO: Interact with a thermostat API print ( "Tool Response: {'status': 'success'}" ) return { "status" : "success" } # Configure the client and model client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_weather_forecast , set_thermostat_temperature ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , config = config , ) # Print the final, user-facing response print ( response . text ) Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . I 've set the thermostat to 20°C. JavaScript This example shows how to use JavaScript/TypeScript SDK to do comopositional function calling using a manual execution loop. import { GoogleGenAI , Type } from "@google/genai" ; // Configure the client const ai = new GoogleGenAI ({}); // Example Functions function get_weather_forecast ({ location }) { console . log ( `Tool Call: get_weather_forecast(location= ${ location } )` ); // TODO: Make API call console . log ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ); return { temperature : 25 , unit : "celsius" }; } function set_thermostat_temperature ({ temperature }) { console . log ( `Tool Call: set_thermostat_temperature(temperature= ${ temperature } )` , ); // TODO: Make API call console . log ( "Tool Response: {'status': 'success'}" ); return { status : "success" }; } const toolFunctions = { get_weather_forecast , set_thermostat_temperature , }; const tools = [ { functionDeclarations : [ { name : "get_weather_forecast" , description : "Gets the current weather temperature for a given location." , parameters : { type : Type . OBJECT , properties : { location : { type : Type . STRING , }, }, required : [ "location" ], }, }, { name : "set_thermostat_temperature" , description : "Sets the thermostat to a desired temperature." , parameters : { type : Type . OBJECT , properties : { temperature : { type : Type . NUMBER , }, }, required : [ "temperature" ], }, }, ], }, ]; // Prompt for the model let contents = [ { role : "user" , parts : [ { text : "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , }, ], }, ]; // Loop until the model has no more function calls to make while ( true ) { const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents , config : { tools }, }); if ( result . functionCalls && result . functionCalls . length > 0 ) { const functionCall = result . functionCalls [ 0 ]; const { name , args } = functionCall ; if ( ! toolFunctions [ name ]) { throw new Error ( `Unknown function call: ${ name } ` ); } // Call the function and get the response. const toolResponse = toolFunctions [ name ]( args ); const functionResponsePart = { name : functionCall . name , response : { result : toolResponse , }, }; // Send the function response back to the model. contents . push ({ role : "model" , parts : [ { functionCall : functionCall , }, ], }); contents . push ({ role : "user" , parts : [ { functionResponse : functionResponsePart , }, ], }); } else { // No more function calls, break the loop. console . log ( result . text ); break ; } } Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . It 's 25°C in London, so I' ve set the thermostat to 20 ° C . Compositional function calling is a native Live API feature. This means Live API can handle the function calling similar to the Python SDK. Python # Light control schemas turn_on_the_lights_schema = { 'name' : 'turn_on_the_lights' } turn_off_the_lights_schema = { 'name' : 'turn_off_the_lights' } prompt = """ Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? """ tools = [ { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} ] await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Light control schemas const turnOnTheLightsSchema = { name : 'turn_on_the_lights' }; const turnOffTheLightsSchema = { name : 'turn_off_the_lights' }; const prompt = ` Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? ` ; const tools = [ { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } ]; await run ( prompt , tools = tools , modality = "AUDIO" ) Function calling modes The Gemini API lets you control how the model uses the provided tools (function declarations). Specifically, you can set the mode within the. function_calling_config . AUTO (Default) : The model decides whether to generate a natural language response or suggest a function call based on the prompt and context. This is the most flexible mode and recommended for most scenarios. ANY : The model is constrained to always predict a function call and guarantees function schema adherence. If allowed_function_names is not specified, the model can choose from any of the provided function declarations. If allowed_function_names is provided as a list, the model can only choose from the functions in that list. Use this mode when you require a function call response to every prompt (if applicable). NONE : The model is prohibited from making function calls. This is equivalent to sending a request without any function declarations. Use this to temporarily disable function calling without removing your tool definitions. Python from google.genai import types # Configure function calling mode tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = "ANY" , allowed_function_names = [ "get_current_temperature" ] ) ) # Create the generation config config = types . GenerateContentConfig ( tools = [ tools ], # not defined here. tool_config = tool_config , ) JavaScript import { FunctionCallingConfigMode } from '@google/genai' ; // Configure function calling mode const toolConfig = { functionCallingConfig : { mode : FunctionCallingConfigMode . ANY , allowedFunctionNames : [ 'get_current_temperature' ] } }; // Create the generation config const config = { tools : tools , // not defined here. toolConfig : toolConfig , }; Automatic function calling (Python only) When using the Python SDK, you can provide Python functions directly as tools. The SDK converts these functions into declarations, manages the function call execution, and handles the response cycle for you. Define your function with type hints and a docstring. For optimal results, it is recommended to use Google-style docstrings. The SDK will then automatically: Detect function call responses from the model. Call the corresponding Python function in your code. Send the function's response back to the model. Return the model's final text response. The SDK currently does not parse argument descriptions into the property description slots of the generated function declaration. Instead, it sends the entire docstring as the top-level function description. Python from google import genai from google.genai import types # Define the function with type hints and docstring def get_current_temperature ( location : str ) - > dict : """Gets the current temperature for a given location. Args: location: The city and state, e.g. San Francisco, CA Returns: A dictionary containing the temperature and unit. """ # ... (implementation) ... return { "temperature" : 25 , "unit" : "Celsius" } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_current_temperature ] ) # Pass the function itself # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "What's the temperature in Boston?" , config = config , ) print ( response . text ) # The SDK handles the function call and returns the final text You can disable automatic function calling with: Python config = types . GenerateContentConfig ( tools = [ get_current_temperature ], automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ) ) Automatic function schema declaration The API is able to describe any of the following types. Pydantic types are allowed, as long as the fields defined on them are also composed of allowed types. Dict types (like dict[str: int] ) are not well supported here, don't use them. Python AllowedType = ( int | float | bool | str | list [ 'AllowedType' ] | pydantic . BaseModel ) To see what the inferred schema looks like, you can convert it using from_callable : Python def multiply ( a : float , b : float ): """Returns a * b.""" return a * b fn_decl = types . FunctionDeclaration . from_callable ( callable = multiply , client = client ) # to_json_dict() provides a clean JSON representation. print ( fn_decl . to_json_dict ()) Multi-tool use: Combine native tools with function calling You can enable multiple tools combining native tools with function calling at the same time. Here's an example that enables two tools, Grounding with Google Search and code execution , in a request using the Live API . Note: Multi-tool use is a- Live API only feature at the moment. The run() function declaration, which handles the asynchronous websocket setup, is omitted for brevity. Python # Multiple tasks example - combining lights, code execution, and search prompt = """ Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! """ tools = [ { 'google_search' : {}}, { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} # not defined here. ] # Execute the prompt with specified tools in audio modality await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Multiple tasks example - combining lights, code execution, and search const prompt = ` Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! ` ; const tools = [ { googleSearch : {} }, { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } // not defined here. ]; // Execute the prompt with specified tools in audio modality await run ( prompt , { tools : tools , modality : "AUDIO" }); Python developers can try this out in the Live API Tool Use notebook . Model context protocol (MCP) Model Context Protocol (MCP) is an open standard for connecting AI applications with external tools and data. MCP provides a common protocol for models to access context, such as functions (tools), data sources (resources), or predefined prompts. The Gemini SDKs have built-in support for the MCP, reducing boilerplate code and offering automatic tool calling for MCP tools. When the model generates an MCP tool call, the Python and JavaScript client SDK can automatically execute the MCP tool and send the response back to the model in a subsequent request, continuing this loop until no more tool calls are made by the model. Here, you can find an example of how to use a local MCP server with Gemini and mcp SDK. Python Make sure the latest version of the mcp SDK is installed on your platform of choice. pip install mcp Note: Python supports automatic tool calling by passing in the ClientSession into the tools parameters. If you want to disable it, you can provide automatic_function_calling with disabled True . import os import asyncio from datetime import datetime from mcp import ClientSession , StdioServerParameters from mcp.client.stdio import stdio_client from google import genai client = genai . Client () # Create server parameters for stdio connection server_params = StdioServerParameters ( command = "npx" , # Executable args = [ "-y" , "@philschmid/weather-mcp" ], # MCP Server env = None , # Optional environment variables ) async def run (): async with stdio_client ( server_params ) as ( read , write ): async with ClientSession ( read , write ) as session : # Prompt to get the weather for the current day in London. prompt = f "What is the weather in London in { datetime . now () . strftime ( '%Y-%m- %d ' ) } ?" # Initialize the connection between client and server await session . initialize () # Send request to the model with MCP function declarations response = await client . aio . models . generate_content ( model = "gemini-2.5-flash" , contents = prompt , config = genai . types . GenerateContentConfig ( temperature = 0 , tools = [ session ], # uses the session, will automatically call the tool # Uncomment if you **don't** want the SDK to automatically call the tool # automatic_function_calling=genai.types.AutomaticFunctionCallingConfig( # disable=True # ), ), ) print ( response . text ) # Start the asyncio event loop and run the main function asyncio . run ( run ()) JavaScript Make sure the latest version of the mcp SDK is installed on your platform of choice. npm install @modelcontextprotocol/sdk Note: JavaScript supports automatic tool calling by wrapping the client with mcpToTool . If you want to disable it, you can provide automaticFunctionCalling with disabled true . import { GoogleGenAI , FunctionCallingConfigMode , mcpToTool } from '@google/genai' ; import { Client } from "@modelcontextprotocol/sdk/client/index.js" ; import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js" ; // Create server parameters for stdio connection const serverParams = new StdioClientTransport ({ command : "npx" , // Executable args : [ "-y" , "@philschmid/weather-mcp" ] // MCP Server }); const client = new Client ( { name : "example-client" , version : "1.0.0" } ); // Configure the client const ai = new GoogleGenAI ({}); // Initialize the connection between client and server await client . connect ( serverParams ); // Send request to the model with MCP tools const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : `What is the weather in London in ${ new Date (). toLocaleDateString () } ?` , config : { tools : [ mcpToTool ( client )], // uses the session, will automatically call the tool // Uncomment if you **don't** want the sdk to automatically call the tool // automaticFunctionCalling: { // disable: true, // }, }, }); console . log ( response . text ) // Close the connection await client . close (); Limitations with built-in MCP support Built-in MCP support is a experimental feature in our SDKs and has the following limitations: Only tools are supported, not resources nor prompts It is available for the Python and JavaScript/TypeScript SDK. Breaking changes might occur in future releases. Manual integration of MCP servers is always an option if these limit what you're building. Supported models This section lists models and their function calling capabilities. Experimental models are not included. You can find a comprehensive capabilities overview on the model overview page. Model Function Calling Parallel Function Calling Compositional Function Calling Gemini 2.5 Pro ✔️ ✔️ ✔️ Gemini 2.5 Flash ✔️ ✔️ ✔️ Gemini 2.5 Flash-Lite ✔️ ✔️ ✔️ Gemini 2.0 Flash ✔️ ✔️ ✔️ Gemini 2.0 Flash-Lite X X X Best practices Function and Parameter Descriptions: Be extremely clear and specific in your descriptions. The model relies on these to choose the correct function and provide appropriate arguments. Naming: Use descriptive function names (without spaces, periods, or dashes). Strong Typing: Use specific types (integer, string, enum) for parameters to reduce errors. If a parameter has a limited set of valid values, use an enum. Tool Selection: While the model can use an arbitrary number of tools, providing too many can increase the risk of selecting an incorrect or suboptimal tool. For best results, aim to provide only the relevant tools for the context or task, ideally keeping the active set to a maximum of 10-20. Consider dynamic tool selection based on conversation context if you have a large total number of tools. Prompt Engineering: Provide context: Tell the model its role (e.g., "You are a helpful weather assistant."). Give instructions: Specify how and when to use functions (e.g., "Don't guess dates; always use a future date for forecasts."). Encourage clarification: Instruct the model to ask clarifying questions if needed. Temperature: Use a low temperature (e.g., 0) for more deterministic and reliable function calls. Validation: If a function call has significant consequences (e.g., placing an order), validate the call with the user before executing it. Error Handling : Implement robust error handling in your functions to gracefully handle unexpected inputs or API failures. Return informative error messages that the model can use to generate helpful responses to the user. Security: Be mindful of security when calling external APIs. Use appropriate authentication and authorization mechanisms. Avoid exposing sensitive data in function calls. Token Limits: Function descriptions and parameters count towards your input token limit. If you're hitting token limits, consider limiting the number of functions or the length of the descriptions, break down complex tasks into smaller, more focused function sets. Notes and limitations Only a subset of the OpenAPI schema is supported. Supported parameter types in Python are limited. Automatic function calling is a Python SDK feature only. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_function-calling_cd2b43ee.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/function-calling#automatic_function_c
2
  Title: Function calling with the Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Function calling with the Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Function calling with the Gemini API Function calling lets you connect models to external tools and APIs. Instead of generating text responses, the model determines when to call specific functions and provides the necessary parameters to execute real-world actions. This allows the model to act as a bridge between natural language and real-world actions and data. Function calling has 3 primary use cases: Augment Knowledge: Access information from external sources like databases, APIs, and knowledge bases. Extend Capabilities: Use external tools to perform computations and extend the limitations of the model, such as using a calculator or creating charts. Take Actions: Interact with external systems using APIs, such as scheduling appointments, creating invoices, sending emails, or controlling smart home devices. Get Weather Schedule Meeting Create Chart How function calling works Function calling involves a structured interaction between your application, the model, and external functions. Here's a breakdown of the process: Define Function Declaration: Define the function declaration in your application code. Function Declarations describe the function's name, parameters, and purpose to the model. Call LLM with function declarations: Send user prompt along with the function declaration(s) to the model. It analyzes the request and determines if a function call would be helpful. If so, it responds with a structured JSON object. Execute Function Code (Your Responsibility): The Model does not execute the function itself. It's your application's responsibility to process the response and check for Function Call, if Yes : Extract the name and args of the function and execute the corresponding function in your application. No: The model has provided a direct text response to the prompt (this flow is less emphasized in the example but is a possible outcome). Create User friendly response: If a function was executed, capture the result and send it back to the model in a subsequent turn of the conversation. It will use the result to generate a final, user-friendly response that incorporates the information from the function call. This process can be repeated over multiple turns, allowing for complex interactions and workflows. The model also supports calling multiple functions in a single turn ( parallel function calling ) and in sequence ( compositional function calling ). Step 1: Define a function declaration Define a function and its declaration within your application code that allows users to set light values and make an API request. This function could call external services or APIs. Python # Define a function that the model can call to control smart lights set_light_values_declaration = { "name" : "set_light_values" , "description" : "Sets the brightness and color temperature of a light." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "integer" , "description" : "Light level from 0 to 100. Zero is off and 100 is full brightness" , }, "color_temp" : { "type" : "string" , "enum" : [ "daylight" , "cool" , "warm" ], "description" : "Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`." , }, }, "required" : [ "brightness" , "color_temp" ], }, } # This is the actual function that would be called based on the model's suggestion def set_light_values ( brightness : int , color_temp : str ) - > dict [ str , int | str ]: """Set the brightness and color temperature of a room light. (mock API). Args: brightness: Light level from 0 to 100. Zero is off and 100 is full brightness color_temp: Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. Returns: A dictionary containing the set brightness and color temperature. """ return { "brightness" : brightness , "colorTemperature" : color_temp } JavaScript import { Type } from '@google/genai' ; // Define a function that the model can call to control smart lights const setLightValuesFunctionDeclaration = { name : 'set_light_values' , description : 'Sets the brightness and color temperature of a light.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'Light level from 0 to 100. Zero is off and 100 is full brightness' , }, color_temp : { type : Type . STRING , enum : [ 'daylight' , 'cool' , 'warm' ], description : 'Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`.' , }, }, required : [ 'brightness' , 'color_temp' ], }, }; /** * Set the brightness and color temperature of a room light. (mock API) * @param {number} brightness - Light level from 0 to 100. Zero is off and 100 is full brightness * @param {string} color_temp - Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. * @return {Object} A dictionary containing the set brightness and color temperature. */ function setLightValues ( brightness , color_temp ) { return { brightness : brightness , colorTemperature : color_temp }; } Step 2: Call the model with function declarations Once you have defined your function declarations, you can prompt the model to use them. It analyzes the prompt and function declarations and decides whether to respond directly or to call a function. If a function is called, the response object will contain a function call suggestion. Python from google.genai import types # Configure the client and tools client = genai . Client () tools = types . Tool ( function_declarations = [ set_light_values_declaration ]) config = types . GenerateContentConfig ( tools = [ tools ]) # Define user prompt contents = [ types . Content ( role = "user" , parts = [ types . Part ( text = "Turn the lights down to a romantic level" )] ) ] # Send request with function declarations response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = contents config = config , ) print ( response . candidates [ 0 ] . content . parts [ 0 ] . function_call ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Generation config with function declaration const config = { tools : [{ functionDeclarations : [ setLightValuesFunctionDeclaration ] }] }; // Configure the client const ai = new GoogleGenAI ({}); // Define user prompt const contents = [ { role : 'user' , parts : [{ text : 'Turn the lights down to a romantic level' }] } ]; // Send request with function declarations const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( response . functionCalls [ 0 ]); The model then returns a functionCall object in an OpenAPI compatible schema specifying how to call one or more of the declared functions in order to respond to the user's question. Python id = None args = { 'color_temp' : 'warm' , 'brightness' : 25 } name = 'set_light_values' JavaScript { name : 'set_light_values' , args : { brightness : 25 , color_temp : 'warm' } } Step 3: Execute set_light_values function code Extract the function call details from the model's response, parse the arguments , and execute the set_light_values function. Python # Extract tool call details, it may not be in the first part. tool_call = response . candidates [ 0 ] . content . parts [ 0 ] . function_call if tool_call . name == "set_light_values" : result = set_light_values ( ** tool_call . args ) print ( f "Function execution result: { result } " ) JavaScript // Extract tool call details const tool_call = response . functionCalls [ 0 ] let result ; if ( tool_call . name === 'set_light_values' ) { result = setLightValues ( tool_call . args . brightness , tool_call . args . color_temp ); console . log ( `Function execution result: ${ JSON . stringify ( result ) } ` ); } Step 4: Create user friendly response with function result and call the model again Finally, send the result of the function execution back to the model so it can incorporate this information into its final response to the user. Python # Create a function response part function_response_part = types . Part . from_function_response ( name = tool_call . name , response = { "result" : result }, ) # Append function call and result of the function execution to contents contents . append ( response . candidates [ 0 ] . content ) # Append the content from the model's response. contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Create a function response part const function_response_part = { name : tool_call . name , response : { result } } // Append function call and result of the function execution to contents contents . push ( response . candidates [ 0 ]. content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); // Get the final response from the model const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); This completes the function calling flow. The model successfully used the set_light_values function to perform the request action of the user. Function declarations When you implement function calling in a prompt, you create a tools object, which contains one or more function declarations . You define functions using JSON, specifically with a select subset of the OpenAPI schema format. A single function declaration can include the following parameters: name (string): A unique name for the function ( get_weather_forecast , send_email ). Use descriptive names without spaces or special characters (use underscores or camelCase). description (string): A clear and detailed explanation of the function's purpose and capabilities. This is crucial for the model to understand when to use the function. Be specific and provide examples if helpful ("Finds theaters based on location and optionally movie title which is currently playing in theaters."). parameters (object): Defines the input parameters the function expects. type (string): Specifies the overall data type, such as object . properties (object): Lists individual parameters, each with: type (string): The data type of the parameter, such as string , integer , boolean, array . description (string): A description of the parameter's purpose and format. Provide examples and constraints ("The city and state, e.g., 'San Francisco, CA' or a zip code e.g., '95616'."). enum (array, optional): If the parameter values are from a fixed set, use "enum" to list the allowed values instead of just describing them in the description. This improves accuracy ("enum": ["daylight", "cool", "warm"]). required (array): An array of strings listing the parameter names that are mandatory for the function to operate. Function calling with thinking Enabling "thinking" can improve function call performance by allowing the model to reason through a request before suggesting function calls. However, because the Gemini API is stateless, this reasoning context is lost between turns, which can reduce the quality of function calls as they require multiple turn requests. To preserve this context you can use thought signatures. A thought signature is an encrypted representation of the model's internal thought process that you pass back to the model on subsequent turns. To use thought signatures: Receive the signature: When thinking is enabled, the API response will include a thought_signature field containing an encrypted representation of the model's reasoning. Return the signature: When you send the function's execution result back to the server, include the thought_signature you received. This allows the model to restore its previous thinking context and will likely result in better function calling performance. Receiving signatures from the server Signatures are returned in the part after the model's thinking phase, which typically is a text or function call. Here are some examples of what thought signatures look like returned in each type of part, in response to the request "What's the weather in Lake Tahoe?" using the Get Weather example: Text part [{ "candidates" : [ { "content" : { "parts" : [ { "text" : "Here's what the weather in Lake Tahoe is today" , "thoughtSignature" : "ClcBVKhc7ru7KzUI7SrdUoIdAYLm/+i93aHjfIt4xHyAoO/G70tApxnK2ujBhOhC1PrRy1pkQa88fqFvpHNVd1HDjNLO7mkp6/hFwE+SPPEB3fh0hs4oM8MKhgIBVKhc7uIGvrS7i/T4HpfbnYrluFfWNjZ62gewqe4cVdR/Dlh+zbjtYmDD0gPZ+SuBO7vvHQdzsjePRP+2Y5XddX6LEf/cGGgakq8EhVvw/a6IVzUO6XmpHg2Ag1sl8E9+VFH/lC0R0ZuYdFWligtDuYwp5p5q3o59G0TtWeU2MC1y2MJfE9u/KWd313ldka80/X2W/xF2O/4djMp5G2WKcULfve75zeRCy0mc5iS3SB9mTH0cT6x0vtKjeBx50gcg+CQWtJcRuwTVzz54dmvmK9xvnqA8gKGw3DuaM9wfy5hyY7Qg0z3iyyWdP8T/lbjKim8IEQOk7O1vVwP1Ko7oMYH8JgA1CsoBAVSoXO6v4c5RSyd1cn6EIU0pEFQsjW7rYWPuZdOFq/tsGJT9BCfW7KGkPGwlNSq8jTJFvbcJ/DjtndISQYXwiXd2kGa5JfdS2Kh4zOxCxiWtOk+2nCc3+XQk2nonhO+esGJpkDdbbHZSqRgcUtYKq7q28iPFOQvOFyCiZNB7K86Z/6Hnagu2snSlN/BcTMaFGaWpcCClSUo4foRZn3WbNCoM8rcpD7qEJMp4a5baaSxyyeL1ZTGd2HLpFys/oiW6e3oAnhxuIysCwg==" } ] , "role" : "model" } , "index" : 0 } ] , # Remainder of response... Function call part [{ "candidates" : [ { "content" : { "parts" : [ { "functionCall" : { "name" : "getWeather" , "args" : { "city" : "Lake Tahoe" } } , "thoughtSignature" : "CiwBVKhc7nRyTi3HmggPD9iQiRc261f5jwuMdw3H/itDH0emsb9ZVo3Nwx9p6wpsAVSoXO5i8fDV4jBSBLoaWxB5zUdlGY6aIGp+I0oEnwRRSRQ1LOvrDlojEH8JE8HjiKXALdJrvNPiG+HY3GZEO8pZjEZtc3UoBUh7+SVyjK7Xolu7aRYYeUyzrCapoETWypER1jbrJXnFV23hCosBAVSoXO6oIPNJSmbuEDfGafOhuCSHkpr1yjTp35RXYqmCESzRzWf5+nFXLqncqeFo4ohoxbiYQVpVQbOZF81p8o9zg6xeRE7qMeOv+XN7enXGJ4/s3qNFQpfkSMqRdBITN1VpX7jyfEAjvxBNc7PDfDJZmEPY338ZIY5nFFcmzJSWjVrboFt2sMFv+A==" } ] , "role" : "model" } , "finishReason" : "STOP" , "index" : 0 } ] , # Remainder of response... You can confirm that you received a signature and see what a signature looks like using the following code: # Step 2: Call the model with function declarations # ...Generation config, Configure the client, and Define user prompt (No changes) # Send request with declarations (using a thinking model) response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents ) # See thought signatures for part in response . candidates [ 0 ] . content . parts : if part . thought_signature : print ( "Thought signature:" ) print ( part . thought_signature ) Returning signatures back to the server In order to return signatures back: You should return signatures along with their containing parts back to the server You shouldn't merge a part with a signature with another part which also contains a signature. The signature string is not concatenable You shouldn't merge one part with a signature with another part without a signature. This breaks the correct positioning of the thought represented by the signature. The code will remain the same as in Step 4 of the previous section. But in this case (as indicated in the comment below) you will return signatures to the model along with the result of the function execution so the model can incorporate the thoughts into its final response: Python # Step 4: Create user friendly response with function result and call the model again # ...Create a function response part (No change) # Append thought signatures, function call and result of the function execution to contents function_call_content = response . candidates [ 0 ] . content # Append the model's function call message, which includes thought signatures contents . append ( function_call_content ) contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Step 4: Create user friendly response with function result and call the model again // ...Create a function response part (No change) // Append thought signatures, function call and result of the function execution to contents const function_response_content = response . candidates [ 0 ]. content ; contents . push ( function_response_content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); The following shows what a request returning a thought signature may look like: [{ "contents" : [ { "role" : "user" , "parts" : [ { "text" : "what is the weather in Lake Tahoe?" } ] } , { "parts" : [ { "functionCall" : { "name" : "getWeather" , "args" : { "city" : "Lake Tahoe" } } , "thoughtSignature" : "CiIBVKhc7oDPpCaXyJKKssjqr4g3JNOSgJ/M2V+1THC1icsWCmwBVKhc7pBABbZ+zR3e9234WnWWS6GFXmf8IVwpnzjd5KYd7vyJbn/4vTorWBGayj/vbd9JPaZQjxdAIXhoE5mX/MDsQ7M9N/b0qJjHm39tYIBvS4sIWkMDHqTJqXGLzhhKtrTkfbV3RbaJEkQKmwEBVKhc7qVUgC3hfTXZLo9R3AJzUUIx50NKvJTb9B+UU+LBqgg7Nck1x5OpjWVS2R+SsveprIuYOruk2Y0H53J2OJF8qsxTdIq2si8DGW2V7WK8xyoJH5kbqd7drIw1jLb44b6lx4SMyB0VaULuTBki4d+Ljjg1tJTwR0IYMKqDLDZt9mheINsi0ZxcNjfpnDydRXdWbcSwzmK/wgqJAQFUqFzuKgNVElxs3cbO+xebr2IwcOro84nKTisi0tTp9bICPC9fTUhn3L+rvQWA+d3J1Za8at2bakrqiRj7BTh+CVO9fWQMAEQAs3ni0Z2hfaYG92tOD26E4IoZwyYEoWbfNudpH1fr5tEkyqnEGtWIh7H+XoZQ2DXeiOa+br7Zk88SrNE+trJMCogBAVSoXO5e9fBLg7hnbkmKsrzNLnQtLsQm1gNzjcjEC7nJYklYPp0KI2uGBE1PkM8XNsfllAfHVn7LzHcHNlbQ9pJ7QZTSIeG42goS971r5wNZwxaXwCTphClQh826eqJWo6A/28TtAVQWLhTx5ekbP7qb4nh1UblESZ1saxDQAEo4OKPbDzx5BgqKAQFUqFzuVyjNm5i0wN8hTDnKjfpDroEpPPTs531iFy9BOX+xDCdGHy8D+osFpaoBq6TFekQQbz4hIoUR1YEcP4zI80/cNimEeb9IcFxZTTxiNrbhbbcv0969DSMWhB+ZEqIz4vuw4GLe/xcUvqhlChQwFdgIbdOQHSHpatn5uDlktnP/bi26nKuXIwo0AVSoXO7US22OUH7d1f4abNPI0IyAvhqkPp12rbtWLx9vkOtojE8IP+xCfYtIFuZIzRNZqA==" } ] , "role" : "model" } , { "role" : "user" , "parts" : [ { "functionResponse" : { "name" : "getWeather" , "response" : { "response" : { "stringValue" : "Sunny and hot. 90 degrees Fahrenheit" } } } } ] } ] , # Remainder of request... Learn more about limitations and usage of thought signatures, and about thinking models in general, on the Thinking page. Parallel function calling In addition to single turn function calling, you can also call multiple functions at once. Parallel function calling lets you execute multiple functions at once and is used when the functions are not dependent on each other. This is useful in scenarios like gathering data from multiple independent sources, such as retrieving customer details from different databases or checking inventory levels across various warehouses or performing multiple actions such as converting your apartment into a disco. Python power_disco_ball = { "name" : "power_disco_ball" , "description" : "Powers the spinning disco ball." , "parameters" : { "type" : "object" , "properties" : { "power" : { "type" : "boolean" , "description" : "Whether to turn the disco ball on or off." , } }, "required" : [ "power" ], }, } start_music = { "name" : "start_music" , "description" : "Play some music matching the specified parameters." , "parameters" : { "type" : "object" , "properties" : { "energetic" : { "type" : "boolean" , "description" : "Whether the music is energetic or not." , }, "loud" : { "type" : "boolean" , "description" : "Whether the music is loud or not." , }, }, "required" : [ "energetic" , "loud" ], }, } dim_lights = { "name" : "dim_lights" , "description" : "Dim the lights." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "number" , "description" : "The brightness of the lights, 0.0 is off, 1.0 is full." , } }, "required" : [ "brightness" ], }, } JavaScript import { Type } from '@google/genai' ; const powerDiscoBall = { name : 'power_disco_ball' , description : 'Powers the spinning disco ball.' , parameters : { type : Type . OBJECT , properties : { power : { type : Type . BOOLEAN , description : 'Whether to turn the disco ball on or off.' } }, required : [ 'power' ] } }; const startMusic = { name : 'start_music' , description : 'Play some music matching the specified parameters.' , parameters : { type : Type . OBJECT , properties : { energetic : { type : Type . BOOLEAN , description : 'Whether the music is energetic or not.' }, loud : { type : Type . BOOLEAN , description : 'Whether the music is loud or not.' } }, required : [ 'energetic' , 'loud' ] } }; const dimLights = { name : 'dim_lights' , description : 'Dim the lights.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'The brightness of the lights, 0.0 is off, 1.0 is full.' } }, required : [ 'brightness' ] } }; Configure the function calling mode to allow using all of the specified tools. To learn more, you can read about configuring function calling . Python from google import genai from google.genai import types # Configure the client and tools client = genai . Client () house_tools = [ types . Tool ( function_declarations = [ power_disco_ball , start_music , dim_lights ]) ] config = types . GenerateContentConfig ( tools = house_tools , automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ), # Force the model to call 'any' function, instead of chatting. tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = 'ANY' ) ), ) chat = client . chats . create ( model = "gemini-2.5-flash" , config = config ) response = chat . send_message ( "Turn this place into a party!" ) # Print out each of the function calls requested from this single call print ( "Example 1: Forced function calling" ) for fn in response . function_calls : args = ", " . join ( f " { key } = { val } " for key , val in fn . args . items ()) print ( f " { fn . name } ( { args } )" ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Set up function declarations const houseFns = [ powerDiscoBall , startMusic , dimLights ]; const config = { tools : [{ functionDeclarations : houseFns }], // Force the model to call 'any' function, instead of chatting. toolConfig : { functionCallingConfig : { mode : 'any' } } }; // Configure the client const ai = new GoogleGenAI ({}); // Create a chat session const chat = ai . chats . create ({ model : 'gemini-2.5-flash' , config : config }); const response = await chat . sendMessage ({ message : 'Turn this place into a party!' }); // Print out each of the function calls requested from this single call console . log ( "Example 1: Forced function calling" ); for ( const fn of response . functionCalls ) { const args = Object . entries ( fn . args ) . map (([ key , val ]) = > ` ${ key } = ${ val } ` ) . join ( ', ' ); console . log ( ` ${ fn . name } ( ${ args } )` ); } Each of the printed results reflects a single function call that the model has requested. To send the results back, include the responses in the same order as they were requested. The Python SDK supports automatic function calling , which automatically converts Python functions to declarations, handles the function call execution and response cycle for you. Following is an example for the disco use case. Note: Automatic Function Calling is a Python SDK only feature at the moment. Python from google import genai from google.genai import types # Actual function implementations def power_disco_ball_impl ( power : bool ) - > dict : """Powers the spinning disco ball. Args: power: Whether to turn the disco ball on or off. Returns: A status dictionary indicating the current state. """ return { "status" : f "Disco ball powered { 'on' if power else 'off' } " } def start_music_impl ( energetic : bool , loud : bool ) - > dict : """Play some music matching the specified parameters. Args: energetic: Whether the music is energetic or not. loud: Whether the music is loud or not. Returns: A dictionary containing the music settings. """ music_type = "energetic" if energetic else "chill" volume = "loud" if loud else "quiet" return { "music_type" : music_type , "volume" : volume } def dim_lights_impl ( brightness : float ) - > dict : """Dim the lights. Args: brightness: The brightness of the lights, 0.0 is off, 1.0 is full. Returns: A dictionary containing the new brightness setting. """ return { "brightness" : brightness } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ power_disco_ball_impl , start_music_impl , dim_lights_impl ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Do everything you need to this place into party!" , config = config , ) print ( " \n Example 2: Automatic function calling" ) print ( response . text ) # I've turned on the disco ball, started playing loud and energetic music, and dimmed the lights to 50% brightness. Let's get this party started! Compositional function calling Compositional or sequential function calling allows Gemini to chain multiple function calls together to fulfill a complex request. For example, to answer "Get the temperature in my current location", the Gemini API might first invoke a get_current_location() function followed by a get_weather() function that takes the location as a parameter. The following example demonstrates how to implement compositional function calling using the Python SDK and automatic function calling. Python This example uses the automatic function calling feature of the google-genai Python SDK. The SDK automatically converts the Python functions to the required schema, executes the function calls when requested by the model, and sends the results back to the model to complete the task. import os from google import genai from google.genai import types # Example Functions def get_weather_forecast ( location : str ) - > dict : """Gets the current weather temperature for a given location.""" print ( f "Tool Call: get_weather_forecast(location= { location } )" ) # TODO: Make API call print ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ) return { "temperature" : 25 , "unit" : "celsius" } # Dummy response def set_thermostat_temperature ( temperature : int ) - > dict : """Sets the thermostat to a desired temperature.""" print ( f "Tool Call: set_thermostat_temperature(temperature= { temperature } )" ) # TODO: Interact with a thermostat API print ( "Tool Response: {'status': 'success'}" ) return { "status" : "success" } # Configure the client and model client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_weather_forecast , set_thermostat_temperature ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , config = config , ) # Print the final, user-facing response print ( response . text ) Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . I 've set the thermostat to 20°C. JavaScript This example shows how to use JavaScript/TypeScript SDK to do comopositional function calling using a manual execution loop. import { GoogleGenAI , Type } from "@google/genai" ; // Configure the client const ai = new GoogleGenAI ({}); // Example Functions function get_weather_forecast ({ location }) { console . log ( `Tool Call: get_weather_forecast(location= ${ location } )` ); // TODO: Make API call console . log ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ); return { temperature : 25 , unit : "celsius" }; } function set_thermostat_temperature ({ temperature }) { console . log ( `Tool Call: set_thermostat_temperature(temperature= ${ temperature } )` , ); // TODO: Make API call console . log ( "Tool Response: {'status': 'success'}" ); return { status : "success" }; } const toolFunctions = { get_weather_forecast , set_thermostat_temperature , }; const tools = [ { functionDeclarations : [ { name : "get_weather_forecast" , description : "Gets the current weather temperature for a given location." , parameters : { type : Type . OBJECT , properties : { location : { type : Type . STRING , }, }, required : [ "location" ], }, }, { name : "set_thermostat_temperature" , description : "Sets the thermostat to a desired temperature." , parameters : { type : Type . OBJECT , properties : { temperature : { type : Type . NUMBER , }, }, required : [ "temperature" ], }, }, ], }, ]; // Prompt for the model let contents = [ { role : "user" , parts : [ { text : "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , }, ], }, ]; // Loop until the model has no more function calls to make while ( true ) { const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents , config : { tools }, }); if ( result . functionCalls && result . functionCalls . length > 0 ) { const functionCall = result . functionCalls [ 0 ]; const { name , args } = functionCall ; if ( ! toolFunctions [ name ]) { throw new Error ( `Unknown function call: ${ name } ` ); } // Call the function and get the response. const toolResponse = toolFunctions [ name ]( args ); const functionResponsePart = { name : functionCall . name , response : { result : toolResponse , }, }; // Send the function response back to the model. contents . push ({ role : "model" , parts : [ { functionCall : functionCall , }, ], }); contents . push ({ role : "user" , parts : [ { functionResponse : functionResponsePart , }, ], }); } else { // No more function calls, break the loop. console . log ( result . text ); break ; } } Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . It 's 25°C in London, so I' ve set the thermostat to 20 ° C . Compositional function calling is a native Live API feature. This means Live API can handle the function calling similar to the Python SDK. Python # Light control schemas turn_on_the_lights_schema = { 'name' : 'turn_on_the_lights' } turn_off_the_lights_schema = { 'name' : 'turn_off_the_lights' } prompt = """ Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? """ tools = [ { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} ] await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Light control schemas const turnOnTheLightsSchema = { name : 'turn_on_the_lights' }; const turnOffTheLightsSchema = { name : 'turn_off_the_lights' }; const prompt = ` Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? ` ; const tools = [ { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } ]; await run ( prompt , tools = tools , modality = "AUDIO" ) Function calling modes The Gemini API lets you control how the model uses the provided tools (function declarations). Specifically, you can set the mode within the. function_calling_config . AUTO (Default) : The model decides whether to generate a natural language response or suggest a function call based on the prompt and context. This is the most flexible mode and recommended for most scenarios. ANY : The model is constrained to always predict a function call and guarantees function schema adherence. If allowed_function_names is not specified, the model can choose from any of the provided function declarations. If allowed_function_names is provided as a list, the model can only choose from the functions in that list. Use this mode when you require a function call response to every prompt (if applicable). NONE : The model is prohibited from making function calls. This is equivalent to sending a request without any function declarations. Use this to temporarily disable function calling without removing your tool definitions. Python from google.genai import types # Configure function calling mode tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = "ANY" , allowed_function_names = [ "get_current_temperature" ] ) ) # Create the generation config config = types . GenerateContentConfig ( tools = [ tools ], # not defined here. tool_config = tool_config , ) JavaScript import { FunctionCallingConfigMode } from '@google/genai' ; // Configure function calling mode const toolConfig = { functionCallingConfig : { mode : FunctionCallingConfigMode . ANY , allowedFunctionNames : [ 'get_current_temperature' ] } }; // Create the generation config const config = { tools : tools , // not defined here. toolConfig : toolConfig , }; Automatic function calling (Python only) When using the Python SDK, you can provide Python functions directly as tools. The SDK automatically converts the Python function to declarations, handles the function call execution and the response cycle for you. The Python SDK then automatically: Detects function call responses from the model. Call the corresponding Python function in your code. Sends the function response back to the model. Returns the model's final text response. To use this, define your function with type hints and a docstring, and then pass the function itself (not a JSON declaration) as a tool: Python from google import genai from google.genai import types # Define the function with type hints and docstring def get_current_temperature ( location : str ) - > dict : """Gets the current temperature for a given location. Args: location: The city and state, e.g. San Francisco, CA Returns: A dictionary containing the temperature and unit. """ # ... (implementation) ... return { "temperature" : 25 , "unit" : "Celsius" } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_current_temperature ] ) # Pass the function itself # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "What's the temperature in Boston?" , config = config , ) print ( response . text ) # The SDK handles the function call and returns the final text You can disable automatic function calling with: Python config = types . GenerateContentConfig ( tools = [ get_current_temperature ], automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ) ) Automatic function schema declaration Automatic schema extraction from Python functions doesn't work in all cases. For example, it doesn't handle cases where you describe the fields of a nested dictionary-object. The API is able to describe any of the following types: Python AllowedType = ( int | float | bool | str | list [ 'AllowedType' ] | dict [ str , AllowedType ]) To see what the inferred schema looks like, you can convert it using from_callable : Python def multiply ( a : float , b : float ): """Returns a * b.""" return a * b fn_decl = types . FunctionDeclaration . from_callable ( callable = multiply , client = client ) # to_json_dict() provides a clean JSON representation. print ( fn_decl . to_json_dict ()) Multi-tool use: Combine native tools with function calling You can enable multiple tools combining native tools with function calling at the same time. Here's an example that enables two tools, Grounding with Google Search and code execution , in a request using the Live API . Note: Multi-tool use is a- Live API only feature at the moment. The run() function declaration, which handles the asynchronous websocket setup, is omitted for brevity. Python # Multiple tasks example - combining lights, code execution, and search prompt = """ Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! """ tools = [ { 'google_search' : {}}, { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} # not defined here. ] # Execute the prompt with specified tools in audio modality await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Multiple tasks example - combining lights, code execution, and search const prompt = ` Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! ` ; const tools = [ { googleSearch : {} }, { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } // not defined here. ]; // Execute the prompt with specified tools in audio modality await run ( prompt , { tools : tools , modality : "AUDIO" }); Python developers can try this out in the Live API Tool Use notebook . Model context protocol (MCP) Model Context Protocol (MCP) is an open standard for connecting AI applications with external tools and data. MCP provides a common protocol for models to access context, such as functions (tools), data sources (resources), or predefined prompts. The Gemini SDKs have built-in support for the MCP, reducing boilerplate code and offering automatic tool calling for MCP tools. When the model generates an MCP tool call, the Python and JavaScript client SDK can automatically execute the MCP tool and send the response back to the model in a subsequent request, continuing this loop until no more tool calls are made by the model. Here, you can find an example of how to use a local MCP server with Gemini and mcp SDK. Python Make sure the latest version of the mcp SDK is installed on your platform of choice. pip install mcp Note: Python supports automatic tool calling by passing in the ClientSession into the tools parameters. If you want to disable it, you can provide automatic_function_calling with disabled True . import os import asyncio from datetime import datetime from mcp import ClientSession , StdioServerParameters from mcp.client.stdio import stdio_client from google import genai client = genai . Client () # Create server parameters for stdio connection server_params = StdioServerParameters ( command = "npx" , # Executable args = [ "-y" , "@philschmid/weather-mcp" ], # MCP Server env = None , # Optional environment variables ) async def run (): async with stdio_client ( server_params ) as ( read , write ): async with ClientSession ( read , write ) as session : # Prompt to get the weather for the current day in London. prompt = f "What is the weather in London in { datetime . now () . strftime ( '%Y-%m- %d ' ) } ?" # Initialize the connection between client and server await session . initialize () # Send request to the model with MCP function declarations response = await client . aio . models . generate_content ( model = "gemini-2.5-flash" , contents = prompt , config = genai . types . GenerateContentConfig ( temperature = 0 , tools = [ session ], # uses the session, will automatically call the tool # Uncomment if you **don't** want the SDK to automatically call the tool # automatic_function_calling=genai.types.AutomaticFunctionCallingConfig( # disable=True # ), ), ) print ( response . text ) # Start the asyncio event loop and run the main function asyncio . run ( run ()) JavaScript Make sure the latest version of the mcp SDK is installed on your platform of choice. npm install @modelcontextprotocol/sdk Note: JavaScript supports automatic tool calling by wrapping the client with mcpToTool . If you want to disable it, you can provide automaticFunctionCalling with disabled true . import { GoogleGenAI , FunctionCallingConfigMode , mcpToTool } from '@google/genai' ; import { Client } from "@modelcontextprotocol/sdk/client/index.js" ; import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js" ; // Create server parameters for stdio connection const serverParams = new StdioClientTransport ({ command : "npx" , // Executable args : [ "-y" , "@philschmid/weather-mcp" ] // MCP Server }); const client = new Client ( { name : "example-client" , version : "1.0.0" } ); // Configure the client const ai = new GoogleGenAI ({}); // Initialize the connection between client and server await client . connect ( serverParams ); // Send request to the model with MCP tools const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : `What is the weather in London in ${ new Date (). toLocaleDateString () } ?` , config : { tools : [ mcpToTool ( client )], // uses the session, will automatically call the tool // Uncomment if you **don't** want the sdk to automatically call the tool // automaticFunctionCalling: { // disable: true, // }, }, }); console . log ( response . text ) // Close the connection await client . close (); Limitations with built-in MCP support Built-in MCP support is a experimental feature in our SDKs and has the following limitations: Only tools are supported, not resources nor prompts It is available for the Python and JavaScript/TypeScript SDK. Breaking changes might occur in future releases. Manual integration of MCP servers is always an option if these limit what you're building. Supported models This section lists models and their function calling capabilities. Experimental models are not included. You can find a comprehensive capabilities overview on the model overview page. Model Function Calling Parallel Function Calling Compositional Function Calling Gemini 2.5 Pro ✔️ ✔️ ✔️ Gemini 2.5 Flash ✔️ ✔️ ✔️ Gemini 2.5 Flash-Lite ✔️ ✔️ ✔️ Gemini 2.0 Flash ✔️ ✔️ ✔️ Gemini 2.0 Flash-Lite X X X Best practices Function and Parameter Descriptions: Be extremely clear and specific in your descriptions. The model relies on these to choose the correct function and provide appropriate arguments. Naming: Use descriptive function names (without spaces, periods, or dashes). Strong Typing: Use specific types (integer, string, enum) for parameters to reduce errors. If a parameter has a limited set of valid values, use an enum. Tool Selection: While the model can use an arbitrary number of tools, providing too many can increase the risk of selecting an incorrect or suboptimal tool. For best results, aim to provide only the relevant tools for the context or task, ideally keeping the active set to a maximum of 10-20. Consider dynamic tool selection based on conversation context if you have a large total number of tools. Prompt Engineering: Provide context: Tell the model its role (e.g., "You are a helpful weather assistant."). Give instructions: Specify how and when to use functions (e.g., "Don't guess dates; always use a future date for forecasts."). Encourage clarification: Instruct the model to ask clarifying questions if needed. Temperature: Use a low temperature (e.g., 0) for more deterministic and reliable function calls. Validation: If a function call has significant consequences (e.g., placing an order), validate the call with the user before executing it. Error Handling : Implement robust error handling in your functions to gracefully handle unexpected inputs or API failures. Return informative error messages that the model can use to generate helpful responses to the user. Security: Be mindful of security when calling external APIs. Use appropriate authentication and authorization mechanisms. Avoid exposing sensitive data in function calls. Token Limits: Function descriptions and parameters count towards your input token limit. If you're hitting token limits, consider limiting the number of functions or the length of the descriptions, break down complex tasks into smaller, more focused function sets. Notes and limitations Only a subset of the OpenAPI schema is supported. Supported parameter types in Python are limited. Automatic function calling is a Python SDK feature only. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-10 UTC.
 
2
  Title: Function calling with the Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Function calling with the Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Function calling with the Gemini API Function calling lets you connect models to external tools and APIs. Instead of generating text responses, the model determines when to call specific functions and provides the necessary parameters to execute real-world actions. This allows the model to act as a bridge between natural language and real-world actions and data. Function calling has 3 primary use cases: Augment Knowledge: Access information from external sources like databases, APIs, and knowledge bases. Extend Capabilities: Use external tools to perform computations and extend the limitations of the model, such as using a calculator or creating charts. Take Actions: Interact with external systems using APIs, such as scheduling appointments, creating invoices, sending emails, or controlling smart home devices. Get Weather Schedule Meeting Create Chart How function calling works Function calling involves a structured interaction between your application, the model, and external functions. Here's a breakdown of the process: Define Function Declaration: Define the function declaration in your application code. Function Declarations describe the function's name, parameters, and purpose to the model. Call LLM with function declarations: Send user prompt along with the function declaration(s) to the model. It analyzes the request and determines if a function call would be helpful. If so, it responds with a structured JSON object. Execute Function Code (Your Responsibility): The Model does not execute the function itself. It's your application's responsibility to process the response and check for Function Call, if Yes : Extract the name and args of the function and execute the corresponding function in your application. No: The model has provided a direct text response to the prompt (this flow is less emphasized in the example but is a possible outcome). Create User friendly response: If a function was executed, capture the result and send it back to the model in a subsequent turn of the conversation. It will use the result to generate a final, user-friendly response that incorporates the information from the function call. This process can be repeated over multiple turns, allowing for complex interactions and workflows. The model also supports calling multiple functions in a single turn ( parallel function calling ) and in sequence ( compositional function calling ). Step 1: Define a function declaration Define a function and its declaration within your application code that allows users to set light values and make an API request. This function could call external services or APIs. Python # Define a function that the model can call to control smart lights set_light_values_declaration = { "name" : "set_light_values" , "description" : "Sets the brightness and color temperature of a light." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "integer" , "description" : "Light level from 0 to 100. Zero is off and 100 is full brightness" , }, "color_temp" : { "type" : "string" , "enum" : [ "daylight" , "cool" , "warm" ], "description" : "Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`." , }, }, "required" : [ "brightness" , "color_temp" ], }, } # This is the actual function that would be called based on the model's suggestion def set_light_values ( brightness : int , color_temp : str ) - > dict [ str , int | str ]: """Set the brightness and color temperature of a room light. (mock API). Args: brightness: Light level from 0 to 100. Zero is off and 100 is full brightness color_temp: Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. Returns: A dictionary containing the set brightness and color temperature. """ return { "brightness" : brightness , "colorTemperature" : color_temp } JavaScript import { Type } from '@google/genai' ; // Define a function that the model can call to control smart lights const setLightValuesFunctionDeclaration = { name : 'set_light_values' , description : 'Sets the brightness and color temperature of a light.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'Light level from 0 to 100. Zero is off and 100 is full brightness' , }, color_temp : { type : Type . STRING , enum : [ 'daylight' , 'cool' , 'warm' ], description : 'Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`.' , }, }, required : [ 'brightness' , 'color_temp' ], }, }; /** * Set the brightness and color temperature of a room light. (mock API) * @param {number} brightness - Light level from 0 to 100. Zero is off and 100 is full brightness * @param {string} color_temp - Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. * @return {Object} A dictionary containing the set brightness and color temperature. */ function setLightValues ( brightness , color_temp ) { return { brightness : brightness , colorTemperature : color_temp }; } Step 2: Call the model with function declarations Once you have defined your function declarations, you can prompt the model to use them. It analyzes the prompt and function declarations and decides whether to respond directly or to call a function. If a function is called, the response object will contain a function call suggestion. Python from google.genai import types # Configure the client and tools client = genai . Client () tools = types . Tool ( function_declarations = [ set_light_values_declaration ]) config = types . GenerateContentConfig ( tools = [ tools ]) # Define user prompt contents = [ types . Content ( role = "user" , parts = [ types . Part ( text = "Turn the lights down to a romantic level" )] ) ] # Send request with function declarations response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = contents config = config , ) print ( response . candidates [ 0 ] . content . parts [ 0 ] . function_call ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Generation config with function declaration const config = { tools : [{ functionDeclarations : [ setLightValuesFunctionDeclaration ] }] }; // Configure the client const ai = new GoogleGenAI ({}); // Define user prompt const contents = [ { role : 'user' , parts : [{ text : 'Turn the lights down to a romantic level' }] } ]; // Send request with function declarations const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( response . functionCalls [ 0 ]); The model then returns a functionCall object in an OpenAPI compatible schema specifying how to call one or more of the declared functions in order to respond to the user's question. Python id = None args = { 'color_temp' : 'warm' , 'brightness' : 25 } name = 'set_light_values' JavaScript { name : 'set_light_values' , args : { brightness : 25 , color_temp : 'warm' } } Step 3: Execute set_light_values function code Extract the function call details from the model's response, parse the arguments , and execute the set_light_values function. Python # Extract tool call details, it may not be in the first part. tool_call = response . candidates [ 0 ] . content . parts [ 0 ] . function_call if tool_call . name == "set_light_values" : result = set_light_values ( ** tool_call . args ) print ( f "Function execution result: { result } " ) JavaScript // Extract tool call details const tool_call = response . functionCalls [ 0 ] let result ; if ( tool_call . name === 'set_light_values' ) { result = setLightValues ( tool_call . args . brightness , tool_call . args . color_temp ); console . log ( `Function execution result: ${ JSON . stringify ( result ) } ` ); } Step 4: Create user friendly response with function result and call the model again Finally, send the result of the function execution back to the model so it can incorporate this information into its final response to the user. Python # Create a function response part function_response_part = types . Part . from_function_response ( name = tool_call . name , response = { "result" : result }, ) # Append function call and result of the function execution to contents contents . append ( response . candidates [ 0 ] . content ) # Append the content from the model's response. contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Create a function response part const function_response_part = { name : tool_call . name , response : { result } } // Append function call and result of the function execution to contents contents . push ( response . candidates [ 0 ]. content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); // Get the final response from the model const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); This completes the function calling flow. The model successfully used the set_light_values function to perform the request action of the user. Function declarations When you implement function calling in a prompt, you create a tools object, which contains one or more function declarations . You define functions using JSON, specifically with a select subset of the OpenAPI schema format. A single function declaration can include the following parameters: name (string): A unique name for the function ( get_weather_forecast , send_email ). Use descriptive names without spaces or special characters (use underscores or camelCase). description (string): A clear and detailed explanation of the function's purpose and capabilities. This is crucial for the model to understand when to use the function. Be specific and provide examples if helpful ("Finds theaters based on location and optionally movie title which is currently playing in theaters."). parameters (object): Defines the input parameters the function expects. type (string): Specifies the overall data type, such as object . properties (object): Lists individual parameters, each with: type (string): The data type of the parameter, such as string , integer , boolean, array . description (string): A description of the parameter's purpose and format. Provide examples and constraints ("The city and state, e.g., 'San Francisco, CA' or a zip code e.g., '95616'."). enum (array, optional): If the parameter values are from a fixed set, use "enum" to list the allowed values instead of just describing them in the description. This improves accuracy ("enum": ["daylight", "cool", "warm"]). required (array): An array of strings listing the parameter names that are mandatory for the function to operate. You can also construct FunctionDeclarations from Python functions directly using types.FunctionDeclaration.from_callable(client=client, callable=your_function) . Function calling with thinking Enabling " thinking " can improve function call performance by allowing the model to reason through a request before suggesting function calls. The Gemini API is stateless, the model's reasoning context will be lost between turns in a multi-turn conversation. To preserve this context, you can use thought signatures. A thought signature is an encrypted representation of the model's internal thought process that you pass back to the model on subsequent turns. The standard pattern for multi-turn tool use is to append the model's complete previous response to the conversation history. The content object includes the thought_signatures automatically. If you follow this pattern No code changes are required . Manually managing thought signatures If you modify the conversation history manually—instead of sending the complete previous response and want to benefit from thinking you must correctly handle the thought_signature included in the model's turn. Follow these rules to ensure the model's context is preserved: Always send the thought_signature back to the model inside its original Part . Don't merge a Part containing a signature with one that does not. This breaks the positional context of the thought. Don't combine two Parts that both contain signatures, as the signature strings cannot be merged. Inspecting Thought Signatures While not necessary for implementation, you can inspect the response to see the thought_signature for debugging or educational purposes. Python import base64 # After receiving a response from a model with thinking enabled # response = client.models.generate_content(...) # The signature is attached to the response part containing the function call part = response . candidates [ 0 ] . content . parts [ 0 ] if part . thought_signature : print ( base64 . b64encode ( part . thought_signature ) . decode ( "utf-8" )) JavaScript // After receiving a response from a model with thinking enabled // const response = await ai.models.generateContent(...) // The signature is attached to the response part containing the function call const part = response . candidates [ 0 ]. content . parts [ 0 ]; if ( part . thoughtSignature ) { console . log ( part . thoughtSignature ); } Learn more about limitations and usage of thought signatures, and about thinking models in general, on the Thinking page. Parallel function calling In addition to single turn function calling, you can also call multiple functions at once. Parallel function calling lets you execute multiple functions at once and is used when the functions are not dependent on each other. This is useful in scenarios like gathering data from multiple independent sources, such as retrieving customer details from different databases or checking inventory levels across various warehouses or performing multiple actions such as converting your apartment into a disco. Python power_disco_ball = { "name" : "power_disco_ball" , "description" : "Powers the spinning disco ball." , "parameters" : { "type" : "object" , "properties" : { "power" : { "type" : "boolean" , "description" : "Whether to turn the disco ball on or off." , } }, "required" : [ "power" ], }, } start_music = { "name" : "start_music" , "description" : "Play some music matching the specified parameters." , "parameters" : { "type" : "object" , "properties" : { "energetic" : { "type" : "boolean" , "description" : "Whether the music is energetic or not." , }, "loud" : { "type" : "boolean" , "description" : "Whether the music is loud or not." , }, }, "required" : [ "energetic" , "loud" ], }, } dim_lights = { "name" : "dim_lights" , "description" : "Dim the lights." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "number" , "description" : "The brightness of the lights, 0.0 is off, 1.0 is full." , } }, "required" : [ "brightness" ], }, } JavaScript import { Type } from '@google/genai' ; const powerDiscoBall = { name : 'power_disco_ball' , description : 'Powers the spinning disco ball.' , parameters : { type : Type . OBJECT , properties : { power : { type : Type . BOOLEAN , description : 'Whether to turn the disco ball on or off.' } }, required : [ 'power' ] } }; const startMusic = { name : 'start_music' , description : 'Play some music matching the specified parameters.' , parameters : { type : Type . OBJECT , properties : { energetic : { type : Type . BOOLEAN , description : 'Whether the music is energetic or not.' }, loud : { type : Type . BOOLEAN , description : 'Whether the music is loud or not.' } }, required : [ 'energetic' , 'loud' ] } }; const dimLights = { name : 'dim_lights' , description : 'Dim the lights.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'The brightness of the lights, 0.0 is off, 1.0 is full.' } }, required : [ 'brightness' ] } }; Configure the function calling mode to allow using all of the specified tools. To learn more, you can read about configuring function calling . Python from google import genai from google.genai import types # Configure the client and tools client = genai . Client () house_tools = [ types . Tool ( function_declarations = [ power_disco_ball , start_music , dim_lights ]) ] config = types . GenerateContentConfig ( tools = house_tools , automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ), # Force the model to call 'any' function, instead of chatting. tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = 'ANY' ) ), ) chat = client . chats . create ( model = "gemini-2.5-flash" , config = config ) response = chat . send_message ( "Turn this place into a party!" ) # Print out each of the function calls requested from this single call print ( "Example 1: Forced function calling" ) for fn in response . function_calls : args = ", " . join ( f " { key } = { val } " for key , val in fn . args . items ()) print ( f " { fn . name } ( { args } )" ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Set up function declarations const houseFns = [ powerDiscoBall , startMusic , dimLights ]; const config = { tools : [{ functionDeclarations : houseFns }], // Force the model to call 'any' function, instead of chatting. toolConfig : { functionCallingConfig : { mode : 'any' } } }; // Configure the client const ai = new GoogleGenAI ({}); // Create a chat session const chat = ai . chats . create ({ model : 'gemini-2.5-flash' , config : config }); const response = await chat . sendMessage ({ message : 'Turn this place into a party!' }); // Print out each of the function calls requested from this single call console . log ( "Example 1: Forced function calling" ); for ( const fn of response . functionCalls ) { const args = Object . entries ( fn . args ) . map (([ key , val ]) = > ` ${ key } = ${ val } ` ) . join ( ', ' ); console . log ( ` ${ fn . name } ( ${ args } )` ); } Each of the printed results reflects a single function call that the model has requested. To send the results back, include the responses in the same order as they were requested. The Python SDK supports automatic function calling , which automatically converts Python functions to declarations, handles the function call execution and response cycle for you. Following is an example for the disco use case. Note: Automatic Function Calling is a Python SDK only feature at the moment. Python from google import genai from google.genai import types # Actual function implementations def power_disco_ball_impl ( power : bool ) - > dict : """Powers the spinning disco ball. Args: power: Whether to turn the disco ball on or off. Returns: A status dictionary indicating the current state. """ return { "status" : f "Disco ball powered { 'on' if power else 'off' } " } def start_music_impl ( energetic : bool , loud : bool ) - > dict : """Play some music matching the specified parameters. Args: energetic: Whether the music is energetic or not. loud: Whether the music is loud or not. Returns: A dictionary containing the music settings. """ music_type = "energetic" if energetic else "chill" volume = "loud" if loud else "quiet" return { "music_type" : music_type , "volume" : volume } def dim_lights_impl ( brightness : float ) - > dict : """Dim the lights. Args: brightness: The brightness of the lights, 0.0 is off, 1.0 is full. Returns: A dictionary containing the new brightness setting. """ return { "brightness" : brightness } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ power_disco_ball_impl , start_music_impl , dim_lights_impl ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Do everything you need to this place into party!" , config = config , ) print ( " \n Example 2: Automatic function calling" ) print ( response . text ) # I've turned on the disco ball, started playing loud and energetic music, and dimmed the lights to 50% brightness. Let's get this party started! Compositional function calling Compositional or sequential function calling allows Gemini to chain multiple function calls together to fulfill a complex request. For example, to answer "Get the temperature in my current location", the Gemini API might first invoke a get_current_location() function followed by a get_weather() function that takes the location as a parameter. The following example demonstrates how to implement compositional function calling using the Python SDK and automatic function calling. Python This example uses the automatic function calling feature of the google-genai Python SDK. The SDK automatically converts the Python functions to the required schema, executes the function calls when requested by the model, and sends the results back to the model to complete the task. import os from google import genai from google.genai import types # Example Functions def get_weather_forecast ( location : str ) - > dict : """Gets the current weather temperature for a given location.""" print ( f "Tool Call: get_weather_forecast(location= { location } )" ) # TODO: Make API call print ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ) return { "temperature" : 25 , "unit" : "celsius" } # Dummy response def set_thermostat_temperature ( temperature : int ) - > dict : """Sets the thermostat to a desired temperature.""" print ( f "Tool Call: set_thermostat_temperature(temperature= { temperature } )" ) # TODO: Interact with a thermostat API print ( "Tool Response: {'status': 'success'}" ) return { "status" : "success" } # Configure the client and model client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_weather_forecast , set_thermostat_temperature ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , config = config , ) # Print the final, user-facing response print ( response . text ) Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . I 've set the thermostat to 20°C. JavaScript This example shows how to use JavaScript/TypeScript SDK to do comopositional function calling using a manual execution loop. import { GoogleGenAI , Type } from "@google/genai" ; // Configure the client const ai = new GoogleGenAI ({}); // Example Functions function get_weather_forecast ({ location }) { console . log ( `Tool Call: get_weather_forecast(location= ${ location } )` ); // TODO: Make API call console . log ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ); return { temperature : 25 , unit : "celsius" }; } function set_thermostat_temperature ({ temperature }) { console . log ( `Tool Call: set_thermostat_temperature(temperature= ${ temperature } )` , ); // TODO: Make API call console . log ( "Tool Response: {'status': 'success'}" ); return { status : "success" }; } const toolFunctions = { get_weather_forecast , set_thermostat_temperature , }; const tools = [ { functionDeclarations : [ { name : "get_weather_forecast" , description : "Gets the current weather temperature for a given location." , parameters : { type : Type . OBJECT , properties : { location : { type : Type . STRING , }, }, required : [ "location" ], }, }, { name : "set_thermostat_temperature" , description : "Sets the thermostat to a desired temperature." , parameters : { type : Type . OBJECT , properties : { temperature : { type : Type . NUMBER , }, }, required : [ "temperature" ], }, }, ], }, ]; // Prompt for the model let contents = [ { role : "user" , parts : [ { text : "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , }, ], }, ]; // Loop until the model has no more function calls to make while ( true ) { const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents , config : { tools }, }); if ( result . functionCalls && result . functionCalls . length > 0 ) { const functionCall = result . functionCalls [ 0 ]; const { name , args } = functionCall ; if ( ! toolFunctions [ name ]) { throw new Error ( `Unknown function call: ${ name } ` ); } // Call the function and get the response. const toolResponse = toolFunctions [ name ]( args ); const functionResponsePart = { name : functionCall . name , response : { result : toolResponse , }, }; // Send the function response back to the model. contents . push ({ role : "model" , parts : [ { functionCall : functionCall , }, ], }); contents . push ({ role : "user" , parts : [ { functionResponse : functionResponsePart , }, ], }); } else { // No more function calls, break the loop. console . log ( result . text ); break ; } } Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . It 's 25°C in London, so I' ve set the thermostat to 20 ° C . Compositional function calling is a native Live API feature. This means Live API can handle the function calling similar to the Python SDK. Python # Light control schemas turn_on_the_lights_schema = { 'name' : 'turn_on_the_lights' } turn_off_the_lights_schema = { 'name' : 'turn_off_the_lights' } prompt = """ Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? """ tools = [ { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} ] await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Light control schemas const turnOnTheLightsSchema = { name : 'turn_on_the_lights' }; const turnOffTheLightsSchema = { name : 'turn_off_the_lights' }; const prompt = ` Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? ` ; const tools = [ { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } ]; await run ( prompt , tools = tools , modality = "AUDIO" ) Function calling modes The Gemini API lets you control how the model uses the provided tools (function declarations). Specifically, you can set the mode within the. function_calling_config . AUTO (Default) : The model decides whether to generate a natural language response or suggest a function call based on the prompt and context. This is the most flexible mode and recommended for most scenarios. ANY : The model is constrained to always predict a function call and guarantees function schema adherence. If allowed_function_names is not specified, the model can choose from any of the provided function declarations. If allowed_function_names is provided as a list, the model can only choose from the functions in that list. Use this mode when you require a function call response to every prompt (if applicable). NONE : The model is prohibited from making function calls. This is equivalent to sending a request without any function declarations. Use this to temporarily disable function calling without removing your tool definitions. Python from google.genai import types # Configure function calling mode tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = "ANY" , allowed_function_names = [ "get_current_temperature" ] ) ) # Create the generation config config = types . GenerateContentConfig ( tools = [ tools ], # not defined here. tool_config = tool_config , ) JavaScript import { FunctionCallingConfigMode } from '@google/genai' ; // Configure function calling mode const toolConfig = { functionCallingConfig : { mode : FunctionCallingConfigMode . ANY , allowedFunctionNames : [ 'get_current_temperature' ] } }; // Create the generation config const config = { tools : tools , // not defined here. toolConfig : toolConfig , }; Automatic function calling (Python only) When using the Python SDK, you can provide Python functions directly as tools. The SDK converts these functions into declarations, manages the function call execution, and handles the response cycle for you. Define your function with type hints and a docstring. For optimal results, it is recommended to use Google-style docstrings. The SDK will then automatically: Detect function call responses from the model. Call the corresponding Python function in your code. Send the function's response back to the model. Return the model's final text response. The SDK currently does not parse argument descriptions into the property description slots of the generated function declaration. Instead, it sends the entire docstring as the top-level function description. Python from google import genai from google.genai import types # Define the function with type hints and docstring def get_current_temperature ( location : str ) - > dict : """Gets the current temperature for a given location. Args: location: The city and state, e.g. San Francisco, CA Returns: A dictionary containing the temperature and unit. """ # ... (implementation) ... return { "temperature" : 25 , "unit" : "Celsius" } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_current_temperature ] ) # Pass the function itself # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "What's the temperature in Boston?" , config = config , ) print ( response . text ) # The SDK handles the function call and returns the final text You can disable automatic function calling with: Python config = types . GenerateContentConfig ( tools = [ get_current_temperature ], automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ) ) Automatic function schema declaration The API is able to describe any of the following types. Pydantic types are allowed, as long as the fields defined on them are also composed of allowed types. Dict types (like dict[str: int] ) are not well supported here, don't use them. Python AllowedType = ( int | float | bool | str | list [ 'AllowedType' ] | pydantic . BaseModel ) To see what the inferred schema looks like, you can convert it using from_callable : Python def multiply ( a : float , b : float ): """Returns a * b.""" return a * b fn_decl = types . FunctionDeclaration . from_callable ( callable = multiply , client = client ) # to_json_dict() provides a clean JSON representation. print ( fn_decl . to_json_dict ()) Multi-tool use: Combine native tools with function calling You can enable multiple tools combining native tools with function calling at the same time. Here's an example that enables two tools, Grounding with Google Search and code execution , in a request using the Live API . Note: Multi-tool use is a- Live API only feature at the moment. The run() function declaration, which handles the asynchronous websocket setup, is omitted for brevity. Python # Multiple tasks example - combining lights, code execution, and search prompt = """ Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! """ tools = [ { 'google_search' : {}}, { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} # not defined here. ] # Execute the prompt with specified tools in audio modality await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Multiple tasks example - combining lights, code execution, and search const prompt = ` Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! ` ; const tools = [ { googleSearch : {} }, { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } // not defined here. ]; // Execute the prompt with specified tools in audio modality await run ( prompt , { tools : tools , modality : "AUDIO" }); Python developers can try this out in the Live API Tool Use notebook . Model context protocol (MCP) Model Context Protocol (MCP) is an open standard for connecting AI applications with external tools and data. MCP provides a common protocol for models to access context, such as functions (tools), data sources (resources), or predefined prompts. The Gemini SDKs have built-in support for the MCP, reducing boilerplate code and offering automatic tool calling for MCP tools. When the model generates an MCP tool call, the Python and JavaScript client SDK can automatically execute the MCP tool and send the response back to the model in a subsequent request, continuing this loop until no more tool calls are made by the model. Here, you can find an example of how to use a local MCP server with Gemini and mcp SDK. Python Make sure the latest version of the mcp SDK is installed on your platform of choice. pip install mcp Note: Python supports automatic tool calling by passing in the ClientSession into the tools parameters. If you want to disable it, you can provide automatic_function_calling with disabled True . import os import asyncio from datetime import datetime from mcp import ClientSession , StdioServerParameters from mcp.client.stdio import stdio_client from google import genai client = genai . Client () # Create server parameters for stdio connection server_params = StdioServerParameters ( command = "npx" , # Executable args = [ "-y" , "@philschmid/weather-mcp" ], # MCP Server env = None , # Optional environment variables ) async def run (): async with stdio_client ( server_params ) as ( read , write ): async with ClientSession ( read , write ) as session : # Prompt to get the weather for the current day in London. prompt = f "What is the weather in London in { datetime . now () . strftime ( '%Y-%m- %d ' ) } ?" # Initialize the connection between client and server await session . initialize () # Send request to the model with MCP function declarations response = await client . aio . models . generate_content ( model = "gemini-2.5-flash" , contents = prompt , config = genai . types . GenerateContentConfig ( temperature = 0 , tools = [ session ], # uses the session, will automatically call the tool # Uncomment if you **don't** want the SDK to automatically call the tool # automatic_function_calling=genai.types.AutomaticFunctionCallingConfig( # disable=True # ), ), ) print ( response . text ) # Start the asyncio event loop and run the main function asyncio . run ( run ()) JavaScript Make sure the latest version of the mcp SDK is installed on your platform of choice. npm install @modelcontextprotocol/sdk Note: JavaScript supports automatic tool calling by wrapping the client with mcpToTool . If you want to disable it, you can provide automaticFunctionCalling with disabled true . import { GoogleGenAI , FunctionCallingConfigMode , mcpToTool } from '@google/genai' ; import { Client } from "@modelcontextprotocol/sdk/client/index.js" ; import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js" ; // Create server parameters for stdio connection const serverParams = new StdioClientTransport ({ command : "npx" , // Executable args : [ "-y" , "@philschmid/weather-mcp" ] // MCP Server }); const client = new Client ( { name : "example-client" , version : "1.0.0" } ); // Configure the client const ai = new GoogleGenAI ({}); // Initialize the connection between client and server await client . connect ( serverParams ); // Send request to the model with MCP tools const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : `What is the weather in London in ${ new Date (). toLocaleDateString () } ?` , config : { tools : [ mcpToTool ( client )], // uses the session, will automatically call the tool // Uncomment if you **don't** want the sdk to automatically call the tool // automaticFunctionCalling: { // disable: true, // }, }, }); console . log ( response . text ) // Close the connection await client . close (); Limitations with built-in MCP support Built-in MCP support is a experimental feature in our SDKs and has the following limitations: Only tools are supported, not resources nor prompts It is available for the Python and JavaScript/TypeScript SDK. Breaking changes might occur in future releases. Manual integration of MCP servers is always an option if these limit what you're building. Supported models This section lists models and their function calling capabilities. Experimental models are not included. You can find a comprehensive capabilities overview on the model overview page. Model Function Calling Parallel Function Calling Compositional Function Calling Gemini 2.5 Pro ✔️ ✔️ ✔️ Gemini 2.5 Flash ✔️ ✔️ ✔️ Gemini 2.5 Flash-Lite ✔️ ✔️ ✔️ Gemini 2.0 Flash ✔️ ✔️ ✔️ Gemini 2.0 Flash-Lite X X X Best practices Function and Parameter Descriptions: Be extremely clear and specific in your descriptions. The model relies on these to choose the correct function and provide appropriate arguments. Naming: Use descriptive function names (without spaces, periods, or dashes). Strong Typing: Use specific types (integer, string, enum) for parameters to reduce errors. If a parameter has a limited set of valid values, use an enum. Tool Selection: While the model can use an arbitrary number of tools, providing too many can increase the risk of selecting an incorrect or suboptimal tool. For best results, aim to provide only the relevant tools for the context or task, ideally keeping the active set to a maximum of 10-20. Consider dynamic tool selection based on conversation context if you have a large total number of tools. Prompt Engineering: Provide context: Tell the model its role (e.g., "You are a helpful weather assistant."). Give instructions: Specify how and when to use functions (e.g., "Don't guess dates; always use a future date for forecasts."). Encourage clarification: Instruct the model to ask clarifying questions if needed. Temperature: Use a low temperature (e.g., 0) for more deterministic and reliable function calls. Validation: If a function call has significant consequences (e.g., placing an order), validate the call with the user before executing it. Error Handling : Implement robust error handling in your functions to gracefully handle unexpected inputs or API failures. Return informative error messages that the model can use to generate helpful responses to the user. Security: Be mindful of security when calling external APIs. Use appropriate authentication and authorization mechanisms. Avoid exposing sensitive data in function calls. Token Limits: Function descriptions and parameters count towards your input token limit. If you're hitting token limits, consider limiting the number of functions or the length of the descriptions, break down complex tasks into smaller, more focused function sets. Notes and limitations Only a subset of the OpenAPI schema is supported. Supported parameter types in Python are limited. Automatic function calling is a Python SDK feature only. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_function-calling_dbd56c28.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/function-calling#parallel_function_ca
2
  Title: Function calling with the Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Function calling with the Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Function calling with the Gemini API Function calling lets you connect models to external tools and APIs. Instead of generating text responses, the model determines when to call specific functions and provides the necessary parameters to execute real-world actions. This allows the model to act as a bridge between natural language and real-world actions and data. Function calling has 3 primary use cases: Augment Knowledge: Access information from external sources like databases, APIs, and knowledge bases. Extend Capabilities: Use external tools to perform computations and extend the limitations of the model, such as using a calculator or creating charts. Take Actions: Interact with external systems using APIs, such as scheduling appointments, creating invoices, sending emails, or controlling smart home devices. Get Weather Schedule Meeting Create Chart How function calling works Function calling involves a structured interaction between your application, the model, and external functions. Here's a breakdown of the process: Define Function Declaration: Define the function declaration in your application code. Function Declarations describe the function's name, parameters, and purpose to the model. Call LLM with function declarations: Send user prompt along with the function declaration(s) to the model. It analyzes the request and determines if a function call would be helpful. If so, it responds with a structured JSON object. Execute Function Code (Your Responsibility): The Model does not execute the function itself. It's your application's responsibility to process the response and check for Function Call, if Yes : Extract the name and args of the function and execute the corresponding function in your application. No: The model has provided a direct text response to the prompt (this flow is less emphasized in the example but is a possible outcome). Create User friendly response: If a function was executed, capture the result and send it back to the model in a subsequent turn of the conversation. It will use the result to generate a final, user-friendly response that incorporates the information from the function call. This process can be repeated over multiple turns, allowing for complex interactions and workflows. The model also supports calling multiple functions in a single turn ( parallel function calling ) and in sequence ( compositional function calling ). Step 1: Define a function declaration Define a function and its declaration within your application code that allows users to set light values and make an API request. This function could call external services or APIs. Python # Define a function that the model can call to control smart lights set_light_values_declaration = { "name" : "set_light_values" , "description" : "Sets the brightness and color temperature of a light." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "integer" , "description" : "Light level from 0 to 100. Zero is off and 100 is full brightness" , }, "color_temp" : { "type" : "string" , "enum" : [ "daylight" , "cool" , "warm" ], "description" : "Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`." , }, }, "required" : [ "brightness" , "color_temp" ], }, } # This is the actual function that would be called based on the model's suggestion def set_light_values ( brightness : int , color_temp : str ) - > dict [ str , int | str ]: """Set the brightness and color temperature of a room light. (mock API). Args: brightness: Light level from 0 to 100. Zero is off and 100 is full brightness color_temp: Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. Returns: A dictionary containing the set brightness and color temperature. """ return { "brightness" : brightness , "colorTemperature" : color_temp } JavaScript import { Type } from '@google/genai' ; // Define a function that the model can call to control smart lights const setLightValuesFunctionDeclaration = { name : 'set_light_values' , description : 'Sets the brightness and color temperature of a light.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'Light level from 0 to 100. Zero is off and 100 is full brightness' , }, color_temp : { type : Type . STRING , enum : [ 'daylight' , 'cool' , 'warm' ], description : 'Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`.' , }, }, required : [ 'brightness' , 'color_temp' ], }, }; /** * Set the brightness and color temperature of a room light. (mock API) * @param {number} brightness - Light level from 0 to 100. Zero is off and 100 is full brightness * @param {string} color_temp - Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. * @return {Object} A dictionary containing the set brightness and color temperature. */ function setLightValues ( brightness , color_temp ) { return { brightness : brightness , colorTemperature : color_temp }; } Step 2: Call the model with function declarations Once you have defined your function declarations, you can prompt the model to use them. It analyzes the prompt and function declarations and decides whether to respond directly or to call a function. If a function is called, the response object will contain a function call suggestion. Python from google.genai import types # Configure the client and tools client = genai . Client () tools = types . Tool ( function_declarations = [ set_light_values_declaration ]) config = types . GenerateContentConfig ( tools = [ tools ]) # Define user prompt contents = [ types . Content ( role = "user" , parts = [ types . Part ( text = "Turn the lights down to a romantic level" )] ) ] # Send request with function declarations response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = contents config = config , ) print ( response . candidates [ 0 ] . content . parts [ 0 ] . function_call ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Generation config with function declaration const config = { tools : [{ functionDeclarations : [ setLightValuesFunctionDeclaration ] }] }; // Configure the client const ai = new GoogleGenAI ({}); // Define user prompt const contents = [ { role : 'user' , parts : [{ text : 'Turn the lights down to a romantic level' }] } ]; // Send request with function declarations const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( response . functionCalls [ 0 ]); The model then returns a functionCall object in an OpenAPI compatible schema specifying how to call one or more of the declared functions in order to respond to the user's question. Python id = None args = { 'color_temp' : 'warm' , 'brightness' : 25 } name = 'set_light_values' JavaScript { name : 'set_light_values' , args : { brightness : 25 , color_temp : 'warm' } } Step 3: Execute set_light_values function code Extract the function call details from the model's response, parse the arguments , and execute the set_light_values function. Python # Extract tool call details, it may not be in the first part. tool_call = response . candidates [ 0 ] . content . parts [ 0 ] . function_call if tool_call . name == "set_light_values" : result = set_light_values ( ** tool_call . args ) print ( f "Function execution result: { result } " ) JavaScript // Extract tool call details const tool_call = response . functionCalls [ 0 ] let result ; if ( tool_call . name === 'set_light_values' ) { result = setLightValues ( tool_call . args . brightness , tool_call . args . color_temp ); console . log ( `Function execution result: ${ JSON . stringify ( result ) } ` ); } Step 4: Create user friendly response with function result and call the model again Finally, send the result of the function execution back to the model so it can incorporate this information into its final response to the user. Python # Create a function response part function_response_part = types . Part . from_function_response ( name = tool_call . name , response = { "result" : result }, ) # Append function call and result of the function execution to contents contents . append ( response . candidates [ 0 ] . content ) # Append the content from the model's response. contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Create a function response part const function_response_part = { name : tool_call . name , response : { result } } // Append function call and result of the function execution to contents contents . push ( response . candidates [ 0 ]. content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); // Get the final response from the model const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); This completes the function calling flow. The model successfully used the set_light_values function to perform the request action of the user. Function declarations When you implement function calling in a prompt, you create a tools object, which contains one or more function declarations . You define functions using JSON, specifically with a select subset of the OpenAPI schema format. A single function declaration can include the following parameters: name (string): A unique name for the function ( get_weather_forecast , send_email ). Use descriptive names without spaces or special characters (use underscores or camelCase). description (string): A clear and detailed explanation of the function's purpose and capabilities. This is crucial for the model to understand when to use the function. Be specific and provide examples if helpful ("Finds theaters based on location and optionally movie title which is currently playing in theaters."). parameters (object): Defines the input parameters the function expects. type (string): Specifies the overall data type, such as object . properties (object): Lists individual parameters, each with: type (string): The data type of the parameter, such as string , integer , boolean, array . description (string): A description of the parameter's purpose and format. Provide examples and constraints ("The city and state, e.g., 'San Francisco, CA' or a zip code e.g., '95616'."). enum (array, optional): If the parameter values are from a fixed set, use "enum" to list the allowed values instead of just describing them in the description. This improves accuracy ("enum": ["daylight", "cool", "warm"]). required (array): An array of strings listing the parameter names that are mandatory for the function to operate. Function calling with thinking Enabling "thinking" can improve function call performance by allowing the model to reason through a request before suggesting function calls. However, because the Gemini API is stateless, this reasoning context is lost between turns, which can reduce the quality of function calls as they require multiple turn requests. To preserve this context you can use thought signatures. A thought signature is an encrypted representation of the model's internal thought process that you pass back to the model on subsequent turns. To use thought signatures: Receive the signature: When thinking is enabled, the API response will include a thought_signature field containing an encrypted representation of the model's reasoning. Return the signature: When you send the function's execution result back to the server, include the thought_signature you received. This allows the model to restore its previous thinking context and will likely result in better function calling performance. Receiving signatures from the server Signatures are returned in the part after the model's thinking phase, which typically is a text or function call. Here are some examples of what thought signatures look like returned in each type of part, in response to the request "What's the weather in Lake Tahoe?" using the Get Weather example: Text part [{ "candidates" : [ { "content" : { "parts" : [ { "text" : "Here's what the weather in Lake Tahoe is today" , "thoughtSignature" : "ClcBVKhc7ru7KzUI7SrdUoIdAYLm/+i93aHjfIt4xHyAoO/G70tApxnK2ujBhOhC1PrRy1pkQa88fqFvpHNVd1HDjNLO7mkp6/hFwE+SPPEB3fh0hs4oM8MKhgIBVKhc7uIGvrS7i/T4HpfbnYrluFfWNjZ62gewqe4cVdR/Dlh+zbjtYmDD0gPZ+SuBO7vvHQdzsjePRP+2Y5XddX6LEf/cGGgakq8EhVvw/a6IVzUO6XmpHg2Ag1sl8E9+VFH/lC0R0ZuYdFWligtDuYwp5p5q3o59G0TtWeU2MC1y2MJfE9u/KWd313ldka80/X2W/xF2O/4djMp5G2WKcULfve75zeRCy0mc5iS3SB9mTH0cT6x0vtKjeBx50gcg+CQWtJcRuwTVzz54dmvmK9xvnqA8gKGw3DuaM9wfy5hyY7Qg0z3iyyWdP8T/lbjKim8IEQOk7O1vVwP1Ko7oMYH8JgA1CsoBAVSoXO6v4c5RSyd1cn6EIU0pEFQsjW7rYWPuZdOFq/tsGJT9BCfW7KGkPGwlNSq8jTJFvbcJ/DjtndISQYXwiXd2kGa5JfdS2Kh4zOxCxiWtOk+2nCc3+XQk2nonhO+esGJpkDdbbHZSqRgcUtYKq7q28iPFOQvOFyCiZNB7K86Z/6Hnagu2snSlN/BcTMaFGaWpcCClSUo4foRZn3WbNCoM8rcpD7qEJMp4a5baaSxyyeL1ZTGd2HLpFys/oiW6e3oAnhxuIysCwg==" } ] , "role" : "model" } , "index" : 0 } ] , # Remainder of response... Function call part [{ "candidates" : [ { "content" : { "parts" : [ { "functionCall" : { "name" : "getWeather" , "args" : { "city" : "Lake Tahoe" } } , "thoughtSignature" : "CiwBVKhc7nRyTi3HmggPD9iQiRc261f5jwuMdw3H/itDH0emsb9ZVo3Nwx9p6wpsAVSoXO5i8fDV4jBSBLoaWxB5zUdlGY6aIGp+I0oEnwRRSRQ1LOvrDlojEH8JE8HjiKXALdJrvNPiG+HY3GZEO8pZjEZtc3UoBUh7+SVyjK7Xolu7aRYYeUyzrCapoETWypER1jbrJXnFV23hCosBAVSoXO6oIPNJSmbuEDfGafOhuCSHkpr1yjTp35RXYqmCESzRzWf5+nFXLqncqeFo4ohoxbiYQVpVQbOZF81p8o9zg6xeRE7qMeOv+XN7enXGJ4/s3qNFQpfkSMqRdBITN1VpX7jyfEAjvxBNc7PDfDJZmEPY338ZIY5nFFcmzJSWjVrboFt2sMFv+A==" } ] , "role" : "model" } , "finishReason" : "STOP" , "index" : 0 } ] , # Remainder of response... You can confirm that you received a signature and see what a signature looks like using the following code: # Step 2: Call the model with function declarations # ...Generation config, Configure the client, and Define user prompt (No changes) # Send request with declarations (using a thinking model) response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents ) # See thought signatures for part in response . candidates [ 0 ] . content . parts : if part . thought_signature : print ( "Thought signature:" ) print ( part . thought_signature ) Returning signatures back to the server In order to return signatures back: You should return signatures along with their containing parts back to the server You shouldn't merge a part with a signature with another part which also contains a signature. The signature string is not concatenable You shouldn't merge one part with a signature with another part without a signature. This breaks the correct positioning of the thought represented by the signature. The code will remain the same as in Step 4 of the previous section. But in this case (as indicated in the comment below) you will return signatures to the model along with the result of the function execution so the model can incorporate the thoughts into its final response: Python # Step 4: Create user friendly response with function result and call the model again # ...Create a function response part (No change) # Append thought signatures, function call and result of the function execution to contents function_call_content = response . candidates [ 0 ] . content # Append the model's function call message, which includes thought signatures contents . append ( function_call_content ) contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Step 4: Create user friendly response with function result and call the model again // ...Create a function response part (No change) // Append thought signatures, function call and result of the function execution to contents const function_response_content = response . candidates [ 0 ]. content ; contents . push ( function_response_content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); The following shows what a request returning a thought signature may look like: [{ "contents" : [ { "role" : "user" , "parts" : [ { "text" : "what is the weather in Lake Tahoe?" } ] } , { "parts" : [ { "functionCall" : { "name" : "getWeather" , "args" : { "city" : "Lake Tahoe" } } , "thoughtSignature" : "CiIBVKhc7oDPpCaXyJKKssjqr4g3JNOSgJ/M2V+1THC1icsWCmwBVKhc7pBABbZ+zR3e9234WnWWS6GFXmf8IVwpnzjd5KYd7vyJbn/4vTorWBGayj/vbd9JPaZQjxdAIXhoE5mX/MDsQ7M9N/b0qJjHm39tYIBvS4sIWkMDHqTJqXGLzhhKtrTkfbV3RbaJEkQKmwEBVKhc7qVUgC3hfTXZLo9R3AJzUUIx50NKvJTb9B+UU+LBqgg7Nck1x5OpjWVS2R+SsveprIuYOruk2Y0H53J2OJF8qsxTdIq2si8DGW2V7WK8xyoJH5kbqd7drIw1jLb44b6lx4SMyB0VaULuTBki4d+Ljjg1tJTwR0IYMKqDLDZt9mheINsi0ZxcNjfpnDydRXdWbcSwzmK/wgqJAQFUqFzuKgNVElxs3cbO+xebr2IwcOro84nKTisi0tTp9bICPC9fTUhn3L+rvQWA+d3J1Za8at2bakrqiRj7BTh+CVO9fWQMAEQAs3ni0Z2hfaYG92tOD26E4IoZwyYEoWbfNudpH1fr5tEkyqnEGtWIh7H+XoZQ2DXeiOa+br7Zk88SrNE+trJMCogBAVSoXO5e9fBLg7hnbkmKsrzNLnQtLsQm1gNzjcjEC7nJYklYPp0KI2uGBE1PkM8XNsfllAfHVn7LzHcHNlbQ9pJ7QZTSIeG42goS971r5wNZwxaXwCTphClQh826eqJWo6A/28TtAVQWLhTx5ekbP7qb4nh1UblESZ1saxDQAEo4OKPbDzx5BgqKAQFUqFzuVyjNm5i0wN8hTDnKjfpDroEpPPTs531iFy9BOX+xDCdGHy8D+osFpaoBq6TFekQQbz4hIoUR1YEcP4zI80/cNimEeb9IcFxZTTxiNrbhbbcv0969DSMWhB+ZEqIz4vuw4GLe/xcUvqhlChQwFdgIbdOQHSHpatn5uDlktnP/bi26nKuXIwo0AVSoXO7US22OUH7d1f4abNPI0IyAvhqkPp12rbtWLx9vkOtojE8IP+xCfYtIFuZIzRNZqA==" } ] , "role" : "model" } , { "role" : "user" , "parts" : [ { "functionResponse" : { "name" : "getWeather" , "response" : { "response" : { "stringValue" : "Sunny and hot. 90 degrees Fahrenheit" } } } } ] } ] , # Remainder of request... Learn more about limitations and usage of thought signatures, and about thinking models in general, on the Thinking page. Parallel function calling In addition to single turn function calling, you can also call multiple functions at once. Parallel function calling lets you execute multiple functions at once and is used when the functions are not dependent on each other. This is useful in scenarios like gathering data from multiple independent sources, such as retrieving customer details from different databases or checking inventory levels across various warehouses or performing multiple actions such as converting your apartment into a disco. Python power_disco_ball = { "name" : "power_disco_ball" , "description" : "Powers the spinning disco ball." , "parameters" : { "type" : "object" , "properties" : { "power" : { "type" : "boolean" , "description" : "Whether to turn the disco ball on or off." , } }, "required" : [ "power" ], }, } start_music = { "name" : "start_music" , "description" : "Play some music matching the specified parameters." , "parameters" : { "type" : "object" , "properties" : { "energetic" : { "type" : "boolean" , "description" : "Whether the music is energetic or not." , }, "loud" : { "type" : "boolean" , "description" : "Whether the music is loud or not." , }, }, "required" : [ "energetic" , "loud" ], }, } dim_lights = { "name" : "dim_lights" , "description" : "Dim the lights." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "number" , "description" : "The brightness of the lights, 0.0 is off, 1.0 is full." , } }, "required" : [ "brightness" ], }, } JavaScript import { Type } from '@google/genai' ; const powerDiscoBall = { name : 'power_disco_ball' , description : 'Powers the spinning disco ball.' , parameters : { type : Type . OBJECT , properties : { power : { type : Type . BOOLEAN , description : 'Whether to turn the disco ball on or off.' } }, required : [ 'power' ] } }; const startMusic = { name : 'start_music' , description : 'Play some music matching the specified parameters.' , parameters : { type : Type . OBJECT , properties : { energetic : { type : Type . BOOLEAN , description : 'Whether the music is energetic or not.' }, loud : { type : Type . BOOLEAN , description : 'Whether the music is loud or not.' } }, required : [ 'energetic' , 'loud' ] } }; const dimLights = { name : 'dim_lights' , description : 'Dim the lights.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'The brightness of the lights, 0.0 is off, 1.0 is full.' } }, required : [ 'brightness' ] } }; Configure the function calling mode to allow using all of the specified tools. To learn more, you can read about configuring function calling . Python from google import genai from google.genai import types # Configure the client and tools client = genai . Client () house_tools = [ types . Tool ( function_declarations = [ power_disco_ball , start_music , dim_lights ]) ] config = types . GenerateContentConfig ( tools = house_tools , automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ), # Force the model to call 'any' function, instead of chatting. tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = 'ANY' ) ), ) chat = client . chats . create ( model = "gemini-2.5-flash" , config = config ) response = chat . send_message ( "Turn this place into a party!" ) # Print out each of the function calls requested from this single call print ( "Example 1: Forced function calling" ) for fn in response . function_calls : args = ", " . join ( f " { key } = { val } " for key , val in fn . args . items ()) print ( f " { fn . name } ( { args } )" ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Set up function declarations const houseFns = [ powerDiscoBall , startMusic , dimLights ]; const config = { tools : [{ functionDeclarations : houseFns }], // Force the model to call 'any' function, instead of chatting. toolConfig : { functionCallingConfig : { mode : 'any' } } }; // Configure the client const ai = new GoogleGenAI ({}); // Create a chat session const chat = ai . chats . create ({ model : 'gemini-2.5-flash' , config : config }); const response = await chat . sendMessage ({ message : 'Turn this place into a party!' }); // Print out each of the function calls requested from this single call console . log ( "Example 1: Forced function calling" ); for ( const fn of response . functionCalls ) { const args = Object . entries ( fn . args ) . map (([ key , val ]) = > ` ${ key } = ${ val } ` ) . join ( ', ' ); console . log ( ` ${ fn . name } ( ${ args } )` ); } Each of the printed results reflects a single function call that the model has requested. To send the results back, include the responses in the same order as they were requested. The Python SDK supports automatic function calling , which automatically converts Python functions to declarations, handles the function call execution and response cycle for you. Following is an example for the disco use case. Note: Automatic Function Calling is a Python SDK only feature at the moment. Python from google import genai from google.genai import types # Actual function implementations def power_disco_ball_impl ( power : bool ) - > dict : """Powers the spinning disco ball. Args: power: Whether to turn the disco ball on or off. Returns: A status dictionary indicating the current state. """ return { "status" : f "Disco ball powered { 'on' if power else 'off' } " } def start_music_impl ( energetic : bool , loud : bool ) - > dict : """Play some music matching the specified parameters. Args: energetic: Whether the music is energetic or not. loud: Whether the music is loud or not. Returns: A dictionary containing the music settings. """ music_type = "energetic" if energetic else "chill" volume = "loud" if loud else "quiet" return { "music_type" : music_type , "volume" : volume } def dim_lights_impl ( brightness : float ) - > dict : """Dim the lights. Args: brightness: The brightness of the lights, 0.0 is off, 1.0 is full. Returns: A dictionary containing the new brightness setting. """ return { "brightness" : brightness } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ power_disco_ball_impl , start_music_impl , dim_lights_impl ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Do everything you need to this place into party!" , config = config , ) print ( " \n Example 2: Automatic function calling" ) print ( response . text ) # I've turned on the disco ball, started playing loud and energetic music, and dimmed the lights to 50% brightness. Let's get this party started! Compositional function calling Compositional or sequential function calling allows Gemini to chain multiple function calls together to fulfill a complex request. For example, to answer "Get the temperature in my current location", the Gemini API might first invoke a get_current_location() function followed by a get_weather() function that takes the location as a parameter. The following example demonstrates how to implement compositional function calling using the Python SDK and automatic function calling. Python This example uses the automatic function calling feature of the google-genai Python SDK. The SDK automatically converts the Python functions to the required schema, executes the function calls when requested by the model, and sends the results back to the model to complete the task. import os from google import genai from google.genai import types # Example Functions def get_weather_forecast ( location : str ) - > dict : """Gets the current weather temperature for a given location.""" print ( f "Tool Call: get_weather_forecast(location= { location } )" ) # TODO: Make API call print ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ) return { "temperature" : 25 , "unit" : "celsius" } # Dummy response def set_thermostat_temperature ( temperature : int ) - > dict : """Sets the thermostat to a desired temperature.""" print ( f "Tool Call: set_thermostat_temperature(temperature= { temperature } )" ) # TODO: Interact with a thermostat API print ( "Tool Response: {'status': 'success'}" ) return { "status" : "success" } # Configure the client and model client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_weather_forecast , set_thermostat_temperature ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , config = config , ) # Print the final, user-facing response print ( response . text ) Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . I 've set the thermostat to 20°C. JavaScript This example shows how to use JavaScript/TypeScript SDK to do comopositional function calling using a manual execution loop. import { GoogleGenAI , Type } from "@google/genai" ; // Configure the client const ai = new GoogleGenAI ({}); // Example Functions function get_weather_forecast ({ location }) { console . log ( `Tool Call: get_weather_forecast(location= ${ location } )` ); // TODO: Make API call console . log ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ); return { temperature : 25 , unit : "celsius" }; } function set_thermostat_temperature ({ temperature }) { console . log ( `Tool Call: set_thermostat_temperature(temperature= ${ temperature } )` , ); // TODO: Make API call console . log ( "Tool Response: {'status': 'success'}" ); return { status : "success" }; } const toolFunctions = { get_weather_forecast , set_thermostat_temperature , }; const tools = [ { functionDeclarations : [ { name : "get_weather_forecast" , description : "Gets the current weather temperature for a given location." , parameters : { type : Type . OBJECT , properties : { location : { type : Type . STRING , }, }, required : [ "location" ], }, }, { name : "set_thermostat_temperature" , description : "Sets the thermostat to a desired temperature." , parameters : { type : Type . OBJECT , properties : { temperature : { type : Type . NUMBER , }, }, required : [ "temperature" ], }, }, ], }, ]; // Prompt for the model let contents = [ { role : "user" , parts : [ { text : "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , }, ], }, ]; // Loop until the model has no more function calls to make while ( true ) { const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents , config : { tools }, }); if ( result . functionCalls && result . functionCalls . length > 0 ) { const functionCall = result . functionCalls [ 0 ]; const { name , args } = functionCall ; if ( ! toolFunctions [ name ]) { throw new Error ( `Unknown function call: ${ name } ` ); } // Call the function and get the response. const toolResponse = toolFunctions [ name ]( args ); const functionResponsePart = { name : functionCall . name , response : { result : toolResponse , }, }; // Send the function response back to the model. contents . push ({ role : "model" , parts : [ { functionCall : functionCall , }, ], }); contents . push ({ role : "user" , parts : [ { functionResponse : functionResponsePart , }, ], }); } else { // No more function calls, break the loop. console . log ( result . text ); break ; } } Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . It 's 25°C in London, so I' ve set the thermostat to 20 ° C . Compositional function calling is a native Live API feature. This means Live API can handle the function calling similar to the Python SDK. Python # Light control schemas turn_on_the_lights_schema = { 'name' : 'turn_on_the_lights' } turn_off_the_lights_schema = { 'name' : 'turn_off_the_lights' } prompt = """ Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? """ tools = [ { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} ] await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Light control schemas const turnOnTheLightsSchema = { name : 'turn_on_the_lights' }; const turnOffTheLightsSchema = { name : 'turn_off_the_lights' }; const prompt = ` Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? ` ; const tools = [ { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } ]; await run ( prompt , tools = tools , modality = "AUDIO" ) Function calling modes The Gemini API lets you control how the model uses the provided tools (function declarations). Specifically, you can set the mode within the. function_calling_config . AUTO (Default) : The model decides whether to generate a natural language response or suggest a function call based on the prompt and context. This is the most flexible mode and recommended for most scenarios. ANY : The model is constrained to always predict a function call and guarantees function schema adherence. If allowed_function_names is not specified, the model can choose from any of the provided function declarations. If allowed_function_names is provided as a list, the model can only choose from the functions in that list. Use this mode when you require a function call response to every prompt (if applicable). NONE : The model is prohibited from making function calls. This is equivalent to sending a request without any function declarations. Use this to temporarily disable function calling without removing your tool definitions. Python from google.genai import types # Configure function calling mode tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = "ANY" , allowed_function_names = [ "get_current_temperature" ] ) ) # Create the generation config config = types . GenerateContentConfig ( tools = [ tools ], # not defined here. tool_config = tool_config , ) JavaScript import { FunctionCallingConfigMode } from '@google/genai' ; // Configure function calling mode const toolConfig = { functionCallingConfig : { mode : FunctionCallingConfigMode . ANY , allowedFunctionNames : [ 'get_current_temperature' ] } }; // Create the generation config const config = { tools : tools , // not defined here. toolConfig : toolConfig , }; Automatic function calling (Python only) When using the Python SDK, you can provide Python functions directly as tools. The SDK automatically converts the Python function to declarations, handles the function call execution and the response cycle for you. The Python SDK then automatically: Detects function call responses from the model. Call the corresponding Python function in your code. Sends the function response back to the model. Returns the model's final text response. To use this, define your function with type hints and a docstring, and then pass the function itself (not a JSON declaration) as a tool: Python from google import genai from google.genai import types # Define the function with type hints and docstring def get_current_temperature ( location : str ) - > dict : """Gets the current temperature for a given location. Args: location: The city and state, e.g. San Francisco, CA Returns: A dictionary containing the temperature and unit. """ # ... (implementation) ... return { "temperature" : 25 , "unit" : "Celsius" } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_current_temperature ] ) # Pass the function itself # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "What's the temperature in Boston?" , config = config , ) print ( response . text ) # The SDK handles the function call and returns the final text You can disable automatic function calling with: Python config = types . GenerateContentConfig ( tools = [ get_current_temperature ], automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ) ) Automatic function schema declaration Automatic schema extraction from Python functions doesn't work in all cases. For example, it doesn't handle cases where you describe the fields of a nested dictionary-object. The API is able to describe any of the following types: Python AllowedType = ( int | float | bool | str | list [ 'AllowedType' ] | dict [ str , AllowedType ]) To see what the inferred schema looks like, you can convert it using from_callable : Python def multiply ( a : float , b : float ): """Returns a * b.""" return a * b fn_decl = types . FunctionDeclaration . from_callable ( callable = multiply , client = client ) # to_json_dict() provides a clean JSON representation. print ( fn_decl . to_json_dict ()) Multi-tool use: Combine native tools with function calling You can enable multiple tools combining native tools with function calling at the same time. Here's an example that enables two tools, Grounding with Google Search and code execution , in a request using the Live API . Note: Multi-tool use is a- Live API only feature at the moment. The run() function declaration, which handles the asynchronous websocket setup, is omitted for brevity. Python # Multiple tasks example - combining lights, code execution, and search prompt = """ Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! """ tools = [ { 'google_search' : {}}, { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} # not defined here. ] # Execute the prompt with specified tools in audio modality await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Multiple tasks example - combining lights, code execution, and search const prompt = ` Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! ` ; const tools = [ { googleSearch : {} }, { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } // not defined here. ]; // Execute the prompt with specified tools in audio modality await run ( prompt , { tools : tools , modality : "AUDIO" }); Python developers can try this out in the Live API Tool Use notebook . Model context protocol (MCP) Model Context Protocol (MCP) is an open standard for connecting AI applications with external tools and data. MCP provides a common protocol for models to access context, such as functions (tools), data sources (resources), or predefined prompts. The Gemini SDKs have built-in support for the MCP, reducing boilerplate code and offering automatic tool calling for MCP tools. When the model generates an MCP tool call, the Python and JavaScript client SDK can automatically execute the MCP tool and send the response back to the model in a subsequent request, continuing this loop until no more tool calls are made by the model. Here, you can find an example of how to use a local MCP server with Gemini and mcp SDK. Python Make sure the latest version of the mcp SDK is installed on your platform of choice. pip install mcp Note: Python supports automatic tool calling by passing in the ClientSession into the tools parameters. If you want to disable it, you can provide automatic_function_calling with disabled True . import os import asyncio from datetime import datetime from mcp import ClientSession , StdioServerParameters from mcp.client.stdio import stdio_client from google import genai client = genai . Client () # Create server parameters for stdio connection server_params = StdioServerParameters ( command = "npx" , # Executable args = [ "-y" , "@philschmid/weather-mcp" ], # MCP Server env = None , # Optional environment variables ) async def run (): async with stdio_client ( server_params ) as ( read , write ): async with ClientSession ( read , write ) as session : # Prompt to get the weather for the current day in London. prompt = f "What is the weather in London in { datetime . now () . strftime ( '%Y-%m- %d ' ) } ?" # Initialize the connection between client and server await session . initialize () # Send request to the model with MCP function declarations response = await client . aio . models . generate_content ( model = "gemini-2.5-flash" , contents = prompt , config = genai . types . GenerateContentConfig ( temperature = 0 , tools = [ session ], # uses the session, will automatically call the tool # Uncomment if you **don't** want the SDK to automatically call the tool # automatic_function_calling=genai.types.AutomaticFunctionCallingConfig( # disable=True # ), ), ) print ( response . text ) # Start the asyncio event loop and run the main function asyncio . run ( run ()) JavaScript Make sure the latest version of the mcp SDK is installed on your platform of choice. npm install @modelcontextprotocol/sdk Note: JavaScript supports automatic tool calling by wrapping the client with mcpToTool . If you want to disable it, you can provide automaticFunctionCalling with disabled true . import { GoogleGenAI , FunctionCallingConfigMode , mcpToTool } from '@google/genai' ; import { Client } from "@modelcontextprotocol/sdk/client/index.js" ; import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js" ; // Create server parameters for stdio connection const serverParams = new StdioClientTransport ({ command : "npx" , // Executable args : [ "-y" , "@philschmid/weather-mcp" ] // MCP Server }); const client = new Client ( { name : "example-client" , version : "1.0.0" } ); // Configure the client const ai = new GoogleGenAI ({}); // Initialize the connection between client and server await client . connect ( serverParams ); // Send request to the model with MCP tools const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : `What is the weather in London in ${ new Date (). toLocaleDateString () } ?` , config : { tools : [ mcpToTool ( client )], // uses the session, will automatically call the tool // Uncomment if you **don't** want the sdk to automatically call the tool // automaticFunctionCalling: { // disable: true, // }, }, }); console . log ( response . text ) // Close the connection await client . close (); Limitations with built-in MCP support Built-in MCP support is a experimental feature in our SDKs and has the following limitations: Only tools are supported, not resources nor prompts It is available for the Python and JavaScript/TypeScript SDK. Breaking changes might occur in future releases. Manual integration of MCP servers is always an option if these limit what you're building. Supported models This section lists models and their function calling capabilities. Experimental models are not included. You can find a comprehensive capabilities overview on the model overview page. Model Function Calling Parallel Function Calling Compositional Function Calling Gemini 2.5 Pro ✔️ ✔️ ✔️ Gemini 2.5 Flash ✔️ ✔️ ✔️ Gemini 2.5 Flash-Lite ✔️ ✔️ ✔️ Gemini 2.0 Flash ✔️ ✔️ ✔️ Gemini 2.0 Flash-Lite X X X Best practices Function and Parameter Descriptions: Be extremely clear and specific in your descriptions. The model relies on these to choose the correct function and provide appropriate arguments. Naming: Use descriptive function names (without spaces, periods, or dashes). Strong Typing: Use specific types (integer, string, enum) for parameters to reduce errors. If a parameter has a limited set of valid values, use an enum. Tool Selection: While the model can use an arbitrary number of tools, providing too many can increase the risk of selecting an incorrect or suboptimal tool. For best results, aim to provide only the relevant tools for the context or task, ideally keeping the active set to a maximum of 10-20. Consider dynamic tool selection based on conversation context if you have a large total number of tools. Prompt Engineering: Provide context: Tell the model its role (e.g., "You are a helpful weather assistant."). Give instructions: Specify how and when to use functions (e.g., "Don't guess dates; always use a future date for forecasts."). Encourage clarification: Instruct the model to ask clarifying questions if needed. Temperature: Use a low temperature (e.g., 0) for more deterministic and reliable function calls. Validation: If a function call has significant consequences (e.g., placing an order), validate the call with the user before executing it. Error Handling : Implement robust error handling in your functions to gracefully handle unexpected inputs or API failures. Return informative error messages that the model can use to generate helpful responses to the user. Security: Be mindful of security when calling external APIs. Use appropriate authentication and authorization mechanisms. Avoid exposing sensitive data in function calls. Token Limits: Function descriptions and parameters count towards your input token limit. If you're hitting token limits, consider limiting the number of functions or the length of the descriptions, break down complex tasks into smaller, more focused function sets. Notes and limitations Only a subset of the OpenAPI schema is supported. Supported parameter types in Python are limited. Automatic function calling is a Python SDK feature only. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-10 UTC.
 
2
  Title: Function calling with the Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Function calling with the Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Function calling with the Gemini API Function calling lets you connect models to external tools and APIs. Instead of generating text responses, the model determines when to call specific functions and provides the necessary parameters to execute real-world actions. This allows the model to act as a bridge between natural language and real-world actions and data. Function calling has 3 primary use cases: Augment Knowledge: Access information from external sources like databases, APIs, and knowledge bases. Extend Capabilities: Use external tools to perform computations and extend the limitations of the model, such as using a calculator or creating charts. Take Actions: Interact with external systems using APIs, such as scheduling appointments, creating invoices, sending emails, or controlling smart home devices. Get Weather Schedule Meeting Create Chart How function calling works Function calling involves a structured interaction between your application, the model, and external functions. Here's a breakdown of the process: Define Function Declaration: Define the function declaration in your application code. Function Declarations describe the function's name, parameters, and purpose to the model. Call LLM with function declarations: Send user prompt along with the function declaration(s) to the model. It analyzes the request and determines if a function call would be helpful. If so, it responds with a structured JSON object. Execute Function Code (Your Responsibility): The Model does not execute the function itself. It's your application's responsibility to process the response and check for Function Call, if Yes : Extract the name and args of the function and execute the corresponding function in your application. No: The model has provided a direct text response to the prompt (this flow is less emphasized in the example but is a possible outcome). Create User friendly response: If a function was executed, capture the result and send it back to the model in a subsequent turn of the conversation. It will use the result to generate a final, user-friendly response that incorporates the information from the function call. This process can be repeated over multiple turns, allowing for complex interactions and workflows. The model also supports calling multiple functions in a single turn ( parallel function calling ) and in sequence ( compositional function calling ). Step 1: Define a function declaration Define a function and its declaration within your application code that allows users to set light values and make an API request. This function could call external services or APIs. Python # Define a function that the model can call to control smart lights set_light_values_declaration = { "name" : "set_light_values" , "description" : "Sets the brightness and color temperature of a light." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "integer" , "description" : "Light level from 0 to 100. Zero is off and 100 is full brightness" , }, "color_temp" : { "type" : "string" , "enum" : [ "daylight" , "cool" , "warm" ], "description" : "Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`." , }, }, "required" : [ "brightness" , "color_temp" ], }, } # This is the actual function that would be called based on the model's suggestion def set_light_values ( brightness : int , color_temp : str ) - > dict [ str , int | str ]: """Set the brightness and color temperature of a room light. (mock API). Args: brightness: Light level from 0 to 100. Zero is off and 100 is full brightness color_temp: Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. Returns: A dictionary containing the set brightness and color temperature. """ return { "brightness" : brightness , "colorTemperature" : color_temp } JavaScript import { Type } from '@google/genai' ; // Define a function that the model can call to control smart lights const setLightValuesFunctionDeclaration = { name : 'set_light_values' , description : 'Sets the brightness and color temperature of a light.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'Light level from 0 to 100. Zero is off and 100 is full brightness' , }, color_temp : { type : Type . STRING , enum : [ 'daylight' , 'cool' , 'warm' ], description : 'Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`.' , }, }, required : [ 'brightness' , 'color_temp' ], }, }; /** * Set the brightness and color temperature of a room light. (mock API) * @param {number} brightness - Light level from 0 to 100. Zero is off and 100 is full brightness * @param {string} color_temp - Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. * @return {Object} A dictionary containing the set brightness and color temperature. */ function setLightValues ( brightness , color_temp ) { return { brightness : brightness , colorTemperature : color_temp }; } Step 2: Call the model with function declarations Once you have defined your function declarations, you can prompt the model to use them. It analyzes the prompt and function declarations and decides whether to respond directly or to call a function. If a function is called, the response object will contain a function call suggestion. Python from google.genai import types # Configure the client and tools client = genai . Client () tools = types . Tool ( function_declarations = [ set_light_values_declaration ]) config = types . GenerateContentConfig ( tools = [ tools ]) # Define user prompt contents = [ types . Content ( role = "user" , parts = [ types . Part ( text = "Turn the lights down to a romantic level" )] ) ] # Send request with function declarations response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = contents config = config , ) print ( response . candidates [ 0 ] . content . parts [ 0 ] . function_call ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Generation config with function declaration const config = { tools : [{ functionDeclarations : [ setLightValuesFunctionDeclaration ] }] }; // Configure the client const ai = new GoogleGenAI ({}); // Define user prompt const contents = [ { role : 'user' , parts : [{ text : 'Turn the lights down to a romantic level' }] } ]; // Send request with function declarations const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( response . functionCalls [ 0 ]); The model then returns a functionCall object in an OpenAPI compatible schema specifying how to call one or more of the declared functions in order to respond to the user's question. Python id = None args = { 'color_temp' : 'warm' , 'brightness' : 25 } name = 'set_light_values' JavaScript { name : 'set_light_values' , args : { brightness : 25 , color_temp : 'warm' } } Step 3: Execute set_light_values function code Extract the function call details from the model's response, parse the arguments , and execute the set_light_values function. Python # Extract tool call details, it may not be in the first part. tool_call = response . candidates [ 0 ] . content . parts [ 0 ] . function_call if tool_call . name == "set_light_values" : result = set_light_values ( ** tool_call . args ) print ( f "Function execution result: { result } " ) JavaScript // Extract tool call details const tool_call = response . functionCalls [ 0 ] let result ; if ( tool_call . name === 'set_light_values' ) { result = setLightValues ( tool_call . args . brightness , tool_call . args . color_temp ); console . log ( `Function execution result: ${ JSON . stringify ( result ) } ` ); } Step 4: Create user friendly response with function result and call the model again Finally, send the result of the function execution back to the model so it can incorporate this information into its final response to the user. Python # Create a function response part function_response_part = types . Part . from_function_response ( name = tool_call . name , response = { "result" : result }, ) # Append function call and result of the function execution to contents contents . append ( response . candidates [ 0 ] . content ) # Append the content from the model's response. contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Create a function response part const function_response_part = { name : tool_call . name , response : { result } } // Append function call and result of the function execution to contents contents . push ( response . candidates [ 0 ]. content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); // Get the final response from the model const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); This completes the function calling flow. The model successfully used the set_light_values function to perform the request action of the user. Function declarations When you implement function calling in a prompt, you create a tools object, which contains one or more function declarations . You define functions using JSON, specifically with a select subset of the OpenAPI schema format. A single function declaration can include the following parameters: name (string): A unique name for the function ( get_weather_forecast , send_email ). Use descriptive names without spaces or special characters (use underscores or camelCase). description (string): A clear and detailed explanation of the function's purpose and capabilities. This is crucial for the model to understand when to use the function. Be specific and provide examples if helpful ("Finds theaters based on location and optionally movie title which is currently playing in theaters."). parameters (object): Defines the input parameters the function expects. type (string): Specifies the overall data type, such as object . properties (object): Lists individual parameters, each with: type (string): The data type of the parameter, such as string , integer , boolean, array . description (string): A description of the parameter's purpose and format. Provide examples and constraints ("The city and state, e.g., 'San Francisco, CA' or a zip code e.g., '95616'."). enum (array, optional): If the parameter values are from a fixed set, use "enum" to list the allowed values instead of just describing them in the description. This improves accuracy ("enum": ["daylight", "cool", "warm"]). required (array): An array of strings listing the parameter names that are mandatory for the function to operate. You can also construct FunctionDeclarations from Python functions directly using types.FunctionDeclaration.from_callable(client=client, callable=your_function) . Function calling with thinking Enabling " thinking " can improve function call performance by allowing the model to reason through a request before suggesting function calls. The Gemini API is stateless, the model's reasoning context will be lost between turns in a multi-turn conversation. To preserve this context, you can use thought signatures. A thought signature is an encrypted representation of the model's internal thought process that you pass back to the model on subsequent turns. The standard pattern for multi-turn tool use is to append the model's complete previous response to the conversation history. The content object includes the thought_signatures automatically. If you follow this pattern No code changes are required . Manually managing thought signatures If you modify the conversation history manually—instead of sending the complete previous response and want to benefit from thinking you must correctly handle the thought_signature included in the model's turn. Follow these rules to ensure the model's context is preserved: Always send the thought_signature back to the model inside its original Part . Don't merge a Part containing a signature with one that does not. This breaks the positional context of the thought. Don't combine two Parts that both contain signatures, as the signature strings cannot be merged. Inspecting Thought Signatures While not necessary for implementation, you can inspect the response to see the thought_signature for debugging or educational purposes. Python import base64 # After receiving a response from a model with thinking enabled # response = client.models.generate_content(...) # The signature is attached to the response part containing the function call part = response . candidates [ 0 ] . content . parts [ 0 ] if part . thought_signature : print ( base64 . b64encode ( part . thought_signature ) . decode ( "utf-8" )) JavaScript // After receiving a response from a model with thinking enabled // const response = await ai.models.generateContent(...) // The signature is attached to the response part containing the function call const part = response . candidates [ 0 ]. content . parts [ 0 ]; if ( part . thoughtSignature ) { console . log ( part . thoughtSignature ); } Learn more about limitations and usage of thought signatures, and about thinking models in general, on the Thinking page. Parallel function calling In addition to single turn function calling, you can also call multiple functions at once. Parallel function calling lets you execute multiple functions at once and is used when the functions are not dependent on each other. This is useful in scenarios like gathering data from multiple independent sources, such as retrieving customer details from different databases or checking inventory levels across various warehouses or performing multiple actions such as converting your apartment into a disco. Python power_disco_ball = { "name" : "power_disco_ball" , "description" : "Powers the spinning disco ball." , "parameters" : { "type" : "object" , "properties" : { "power" : { "type" : "boolean" , "description" : "Whether to turn the disco ball on or off." , } }, "required" : [ "power" ], }, } start_music = { "name" : "start_music" , "description" : "Play some music matching the specified parameters." , "parameters" : { "type" : "object" , "properties" : { "energetic" : { "type" : "boolean" , "description" : "Whether the music is energetic or not." , }, "loud" : { "type" : "boolean" , "description" : "Whether the music is loud or not." , }, }, "required" : [ "energetic" , "loud" ], }, } dim_lights = { "name" : "dim_lights" , "description" : "Dim the lights." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "number" , "description" : "The brightness of the lights, 0.0 is off, 1.0 is full." , } }, "required" : [ "brightness" ], }, } JavaScript import { Type } from '@google/genai' ; const powerDiscoBall = { name : 'power_disco_ball' , description : 'Powers the spinning disco ball.' , parameters : { type : Type . OBJECT , properties : { power : { type : Type . BOOLEAN , description : 'Whether to turn the disco ball on or off.' } }, required : [ 'power' ] } }; const startMusic = { name : 'start_music' , description : 'Play some music matching the specified parameters.' , parameters : { type : Type . OBJECT , properties : { energetic : { type : Type . BOOLEAN , description : 'Whether the music is energetic or not.' }, loud : { type : Type . BOOLEAN , description : 'Whether the music is loud or not.' } }, required : [ 'energetic' , 'loud' ] } }; const dimLights = { name : 'dim_lights' , description : 'Dim the lights.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'The brightness of the lights, 0.0 is off, 1.0 is full.' } }, required : [ 'brightness' ] } }; Configure the function calling mode to allow using all of the specified tools. To learn more, you can read about configuring function calling . Python from google import genai from google.genai import types # Configure the client and tools client = genai . Client () house_tools = [ types . Tool ( function_declarations = [ power_disco_ball , start_music , dim_lights ]) ] config = types . GenerateContentConfig ( tools = house_tools , automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ), # Force the model to call 'any' function, instead of chatting. tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = 'ANY' ) ), ) chat = client . chats . create ( model = "gemini-2.5-flash" , config = config ) response = chat . send_message ( "Turn this place into a party!" ) # Print out each of the function calls requested from this single call print ( "Example 1: Forced function calling" ) for fn in response . function_calls : args = ", " . join ( f " { key } = { val } " for key , val in fn . args . items ()) print ( f " { fn . name } ( { args } )" ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Set up function declarations const houseFns = [ powerDiscoBall , startMusic , dimLights ]; const config = { tools : [{ functionDeclarations : houseFns }], // Force the model to call 'any' function, instead of chatting. toolConfig : { functionCallingConfig : { mode : 'any' } } }; // Configure the client const ai = new GoogleGenAI ({}); // Create a chat session const chat = ai . chats . create ({ model : 'gemini-2.5-flash' , config : config }); const response = await chat . sendMessage ({ message : 'Turn this place into a party!' }); // Print out each of the function calls requested from this single call console . log ( "Example 1: Forced function calling" ); for ( const fn of response . functionCalls ) { const args = Object . entries ( fn . args ) . map (([ key , val ]) = > ` ${ key } = ${ val } ` ) . join ( ', ' ); console . log ( ` ${ fn . name } ( ${ args } )` ); } Each of the printed results reflects a single function call that the model has requested. To send the results back, include the responses in the same order as they were requested. The Python SDK supports automatic function calling , which automatically converts Python functions to declarations, handles the function call execution and response cycle for you. Following is an example for the disco use case. Note: Automatic Function Calling is a Python SDK only feature at the moment. Python from google import genai from google.genai import types # Actual function implementations def power_disco_ball_impl ( power : bool ) - > dict : """Powers the spinning disco ball. Args: power: Whether to turn the disco ball on or off. Returns: A status dictionary indicating the current state. """ return { "status" : f "Disco ball powered { 'on' if power else 'off' } " } def start_music_impl ( energetic : bool , loud : bool ) - > dict : """Play some music matching the specified parameters. Args: energetic: Whether the music is energetic or not. loud: Whether the music is loud or not. Returns: A dictionary containing the music settings. """ music_type = "energetic" if energetic else "chill" volume = "loud" if loud else "quiet" return { "music_type" : music_type , "volume" : volume } def dim_lights_impl ( brightness : float ) - > dict : """Dim the lights. Args: brightness: The brightness of the lights, 0.0 is off, 1.0 is full. Returns: A dictionary containing the new brightness setting. """ return { "brightness" : brightness } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ power_disco_ball_impl , start_music_impl , dim_lights_impl ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Do everything you need to this place into party!" , config = config , ) print ( " \n Example 2: Automatic function calling" ) print ( response . text ) # I've turned on the disco ball, started playing loud and energetic music, and dimmed the lights to 50% brightness. Let's get this party started! Compositional function calling Compositional or sequential function calling allows Gemini to chain multiple function calls together to fulfill a complex request. For example, to answer "Get the temperature in my current location", the Gemini API might first invoke a get_current_location() function followed by a get_weather() function that takes the location as a parameter. The following example demonstrates how to implement compositional function calling using the Python SDK and automatic function calling. Python This example uses the automatic function calling feature of the google-genai Python SDK. The SDK automatically converts the Python functions to the required schema, executes the function calls when requested by the model, and sends the results back to the model to complete the task. import os from google import genai from google.genai import types # Example Functions def get_weather_forecast ( location : str ) - > dict : """Gets the current weather temperature for a given location.""" print ( f "Tool Call: get_weather_forecast(location= { location } )" ) # TODO: Make API call print ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ) return { "temperature" : 25 , "unit" : "celsius" } # Dummy response def set_thermostat_temperature ( temperature : int ) - > dict : """Sets the thermostat to a desired temperature.""" print ( f "Tool Call: set_thermostat_temperature(temperature= { temperature } )" ) # TODO: Interact with a thermostat API print ( "Tool Response: {'status': 'success'}" ) return { "status" : "success" } # Configure the client and model client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_weather_forecast , set_thermostat_temperature ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , config = config , ) # Print the final, user-facing response print ( response . text ) Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . I 've set the thermostat to 20°C. JavaScript This example shows how to use JavaScript/TypeScript SDK to do comopositional function calling using a manual execution loop. import { GoogleGenAI , Type } from "@google/genai" ; // Configure the client const ai = new GoogleGenAI ({}); // Example Functions function get_weather_forecast ({ location }) { console . log ( `Tool Call: get_weather_forecast(location= ${ location } )` ); // TODO: Make API call console . log ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ); return { temperature : 25 , unit : "celsius" }; } function set_thermostat_temperature ({ temperature }) { console . log ( `Tool Call: set_thermostat_temperature(temperature= ${ temperature } )` , ); // TODO: Make API call console . log ( "Tool Response: {'status': 'success'}" ); return { status : "success" }; } const toolFunctions = { get_weather_forecast , set_thermostat_temperature , }; const tools = [ { functionDeclarations : [ { name : "get_weather_forecast" , description : "Gets the current weather temperature for a given location." , parameters : { type : Type . OBJECT , properties : { location : { type : Type . STRING , }, }, required : [ "location" ], }, }, { name : "set_thermostat_temperature" , description : "Sets the thermostat to a desired temperature." , parameters : { type : Type . OBJECT , properties : { temperature : { type : Type . NUMBER , }, }, required : [ "temperature" ], }, }, ], }, ]; // Prompt for the model let contents = [ { role : "user" , parts : [ { text : "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , }, ], }, ]; // Loop until the model has no more function calls to make while ( true ) { const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents , config : { tools }, }); if ( result . functionCalls && result . functionCalls . length > 0 ) { const functionCall = result . functionCalls [ 0 ]; const { name , args } = functionCall ; if ( ! toolFunctions [ name ]) { throw new Error ( `Unknown function call: ${ name } ` ); } // Call the function and get the response. const toolResponse = toolFunctions [ name ]( args ); const functionResponsePart = { name : functionCall . name , response : { result : toolResponse , }, }; // Send the function response back to the model. contents . push ({ role : "model" , parts : [ { functionCall : functionCall , }, ], }); contents . push ({ role : "user" , parts : [ { functionResponse : functionResponsePart , }, ], }); } else { // No more function calls, break the loop. console . log ( result . text ); break ; } } Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . It 's 25°C in London, so I' ve set the thermostat to 20 ° C . Compositional function calling is a native Live API feature. This means Live API can handle the function calling similar to the Python SDK. Python # Light control schemas turn_on_the_lights_schema = { 'name' : 'turn_on_the_lights' } turn_off_the_lights_schema = { 'name' : 'turn_off_the_lights' } prompt = """ Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? """ tools = [ { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} ] await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Light control schemas const turnOnTheLightsSchema = { name : 'turn_on_the_lights' }; const turnOffTheLightsSchema = { name : 'turn_off_the_lights' }; const prompt = ` Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? ` ; const tools = [ { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } ]; await run ( prompt , tools = tools , modality = "AUDIO" ) Function calling modes The Gemini API lets you control how the model uses the provided tools (function declarations). Specifically, you can set the mode within the. function_calling_config . AUTO (Default) : The model decides whether to generate a natural language response or suggest a function call based on the prompt and context. This is the most flexible mode and recommended for most scenarios. ANY : The model is constrained to always predict a function call and guarantees function schema adherence. If allowed_function_names is not specified, the model can choose from any of the provided function declarations. If allowed_function_names is provided as a list, the model can only choose from the functions in that list. Use this mode when you require a function call response to every prompt (if applicable). NONE : The model is prohibited from making function calls. This is equivalent to sending a request without any function declarations. Use this to temporarily disable function calling without removing your tool definitions. Python from google.genai import types # Configure function calling mode tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = "ANY" , allowed_function_names = [ "get_current_temperature" ] ) ) # Create the generation config config = types . GenerateContentConfig ( tools = [ tools ], # not defined here. tool_config = tool_config , ) JavaScript import { FunctionCallingConfigMode } from '@google/genai' ; // Configure function calling mode const toolConfig = { functionCallingConfig : { mode : FunctionCallingConfigMode . ANY , allowedFunctionNames : [ 'get_current_temperature' ] } }; // Create the generation config const config = { tools : tools , // not defined here. toolConfig : toolConfig , }; Automatic function calling (Python only) When using the Python SDK, you can provide Python functions directly as tools. The SDK converts these functions into declarations, manages the function call execution, and handles the response cycle for you. Define your function with type hints and a docstring. For optimal results, it is recommended to use Google-style docstrings. The SDK will then automatically: Detect function call responses from the model. Call the corresponding Python function in your code. Send the function's response back to the model. Return the model's final text response. The SDK currently does not parse argument descriptions into the property description slots of the generated function declaration. Instead, it sends the entire docstring as the top-level function description. Python from google import genai from google.genai import types # Define the function with type hints and docstring def get_current_temperature ( location : str ) - > dict : """Gets the current temperature for a given location. Args: location: The city and state, e.g. San Francisco, CA Returns: A dictionary containing the temperature and unit. """ # ... (implementation) ... return { "temperature" : 25 , "unit" : "Celsius" } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_current_temperature ] ) # Pass the function itself # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "What's the temperature in Boston?" , config = config , ) print ( response . text ) # The SDK handles the function call and returns the final text You can disable automatic function calling with: Python config = types . GenerateContentConfig ( tools = [ get_current_temperature ], automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ) ) Automatic function schema declaration The API is able to describe any of the following types. Pydantic types are allowed, as long as the fields defined on them are also composed of allowed types. Dict types (like dict[str: int] ) are not well supported here, don't use them. Python AllowedType = ( int | float | bool | str | list [ 'AllowedType' ] | pydantic . BaseModel ) To see what the inferred schema looks like, you can convert it using from_callable : Python def multiply ( a : float , b : float ): """Returns a * b.""" return a * b fn_decl = types . FunctionDeclaration . from_callable ( callable = multiply , client = client ) # to_json_dict() provides a clean JSON representation. print ( fn_decl . to_json_dict ()) Multi-tool use: Combine native tools with function calling You can enable multiple tools combining native tools with function calling at the same time. Here's an example that enables two tools, Grounding with Google Search and code execution , in a request using the Live API . Note: Multi-tool use is a- Live API only feature at the moment. The run() function declaration, which handles the asynchronous websocket setup, is omitted for brevity. Python # Multiple tasks example - combining lights, code execution, and search prompt = """ Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! """ tools = [ { 'google_search' : {}}, { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} # not defined here. ] # Execute the prompt with specified tools in audio modality await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Multiple tasks example - combining lights, code execution, and search const prompt = ` Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! ` ; const tools = [ { googleSearch : {} }, { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } // not defined here. ]; // Execute the prompt with specified tools in audio modality await run ( prompt , { tools : tools , modality : "AUDIO" }); Python developers can try this out in the Live API Tool Use notebook . Model context protocol (MCP) Model Context Protocol (MCP) is an open standard for connecting AI applications with external tools and data. MCP provides a common protocol for models to access context, such as functions (tools), data sources (resources), or predefined prompts. The Gemini SDKs have built-in support for the MCP, reducing boilerplate code and offering automatic tool calling for MCP tools. When the model generates an MCP tool call, the Python and JavaScript client SDK can automatically execute the MCP tool and send the response back to the model in a subsequent request, continuing this loop until no more tool calls are made by the model. Here, you can find an example of how to use a local MCP server with Gemini and mcp SDK. Python Make sure the latest version of the mcp SDK is installed on your platform of choice. pip install mcp Note: Python supports automatic tool calling by passing in the ClientSession into the tools parameters. If you want to disable it, you can provide automatic_function_calling with disabled True . import os import asyncio from datetime import datetime from mcp import ClientSession , StdioServerParameters from mcp.client.stdio import stdio_client from google import genai client = genai . Client () # Create server parameters for stdio connection server_params = StdioServerParameters ( command = "npx" , # Executable args = [ "-y" , "@philschmid/weather-mcp" ], # MCP Server env = None , # Optional environment variables ) async def run (): async with stdio_client ( server_params ) as ( read , write ): async with ClientSession ( read , write ) as session : # Prompt to get the weather for the current day in London. prompt = f "What is the weather in London in { datetime . now () . strftime ( '%Y-%m- %d ' ) } ?" # Initialize the connection between client and server await session . initialize () # Send request to the model with MCP function declarations response = await client . aio . models . generate_content ( model = "gemini-2.5-flash" , contents = prompt , config = genai . types . GenerateContentConfig ( temperature = 0 , tools = [ session ], # uses the session, will automatically call the tool # Uncomment if you **don't** want the SDK to automatically call the tool # automatic_function_calling=genai.types.AutomaticFunctionCallingConfig( # disable=True # ), ), ) print ( response . text ) # Start the asyncio event loop and run the main function asyncio . run ( run ()) JavaScript Make sure the latest version of the mcp SDK is installed on your platform of choice. npm install @modelcontextprotocol/sdk Note: JavaScript supports automatic tool calling by wrapping the client with mcpToTool . If you want to disable it, you can provide automaticFunctionCalling with disabled true . import { GoogleGenAI , FunctionCallingConfigMode , mcpToTool } from '@google/genai' ; import { Client } from "@modelcontextprotocol/sdk/client/index.js" ; import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js" ; // Create server parameters for stdio connection const serverParams = new StdioClientTransport ({ command : "npx" , // Executable args : [ "-y" , "@philschmid/weather-mcp" ] // MCP Server }); const client = new Client ( { name : "example-client" , version : "1.0.0" } ); // Configure the client const ai = new GoogleGenAI ({}); // Initialize the connection between client and server await client . connect ( serverParams ); // Send request to the model with MCP tools const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : `What is the weather in London in ${ new Date (). toLocaleDateString () } ?` , config : { tools : [ mcpToTool ( client )], // uses the session, will automatically call the tool // Uncomment if you **don't** want the sdk to automatically call the tool // automaticFunctionCalling: { // disable: true, // }, }, }); console . log ( response . text ) // Close the connection await client . close (); Limitations with built-in MCP support Built-in MCP support is a experimental feature in our SDKs and has the following limitations: Only tools are supported, not resources nor prompts It is available for the Python and JavaScript/TypeScript SDK. Breaking changes might occur in future releases. Manual integration of MCP servers is always an option if these limit what you're building. Supported models This section lists models and their function calling capabilities. Experimental models are not included. You can find a comprehensive capabilities overview on the model overview page. Model Function Calling Parallel Function Calling Compositional Function Calling Gemini 2.5 Pro ✔️ ✔️ ✔️ Gemini 2.5 Flash ✔️ ✔️ ✔️ Gemini 2.5 Flash-Lite ✔️ ✔️ ✔️ Gemini 2.0 Flash ✔️ ✔️ ✔️ Gemini 2.0 Flash-Lite X X X Best practices Function and Parameter Descriptions: Be extremely clear and specific in your descriptions. The model relies on these to choose the correct function and provide appropriate arguments. Naming: Use descriptive function names (without spaces, periods, or dashes). Strong Typing: Use specific types (integer, string, enum) for parameters to reduce errors. If a parameter has a limited set of valid values, use an enum. Tool Selection: While the model can use an arbitrary number of tools, providing too many can increase the risk of selecting an incorrect or suboptimal tool. For best results, aim to provide only the relevant tools for the context or task, ideally keeping the active set to a maximum of 10-20. Consider dynamic tool selection based on conversation context if you have a large total number of tools. Prompt Engineering: Provide context: Tell the model its role (e.g., "You are a helpful weather assistant."). Give instructions: Specify how and when to use functions (e.g., "Don't guess dates; always use a future date for forecasts."). Encourage clarification: Instruct the model to ask clarifying questions if needed. Temperature: Use a low temperature (e.g., 0) for more deterministic and reliable function calls. Validation: If a function call has significant consequences (e.g., placing an order), validate the call with the user before executing it. Error Handling : Implement robust error handling in your functions to gracefully handle unexpected inputs or API failures. Return informative error messages that the model can use to generate helpful responses to the user. Security: Be mindful of security when calling external APIs. Use appropriate authentication and authorization mechanisms. Avoid exposing sensitive data in function calls. Token Limits: Function descriptions and parameters count towards your input token limit. If you're hitting token limits, consider limiting the number of functions or the length of the descriptions, break down complex tasks into smaller, more focused function sets. Notes and limitations Only a subset of the OpenAPI schema is supported. Supported parameter types in Python are limited. Automatic function calling is a Python SDK feature only. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_function-calling_e09d4451.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/function-calling#thinking
2
  Title: Function calling with the Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Function calling with the Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Function calling with the Gemini API Function calling lets you connect models to external tools and APIs. Instead of generating text responses, the model determines when to call specific functions and provides the necessary parameters to execute real-world actions. This allows the model to act as a bridge between natural language and real-world actions and data. Function calling has 3 primary use cases: Augment Knowledge: Access information from external sources like databases, APIs, and knowledge bases. Extend Capabilities: Use external tools to perform computations and extend the limitations of the model, such as using a calculator or creating charts. Take Actions: Interact with external systems using APIs, such as scheduling appointments, creating invoices, sending emails, or controlling smart home devices. Get Weather Schedule Meeting Create Chart How function calling works Function calling involves a structured interaction between your application, the model, and external functions. Here's a breakdown of the process: Define Function Declaration: Define the function declaration in your application code. Function Declarations describe the function's name, parameters, and purpose to the model. Call LLM with function declarations: Send user prompt along with the function declaration(s) to the model. It analyzes the request and determines if a function call would be helpful. If so, it responds with a structured JSON object. Execute Function Code (Your Responsibility): The Model does not execute the function itself. It's your application's responsibility to process the response and check for Function Call, if Yes : Extract the name and args of the function and execute the corresponding function in your application. No: The model has provided a direct text response to the prompt (this flow is less emphasized in the example but is a possible outcome). Create User friendly response: If a function was executed, capture the result and send it back to the model in a subsequent turn of the conversation. It will use the result to generate a final, user-friendly response that incorporates the information from the function call. This process can be repeated over multiple turns, allowing for complex interactions and workflows. The model also supports calling multiple functions in a single turn ( parallel function calling ) and in sequence ( compositional function calling ). Step 1: Define a function declaration Define a function and its declaration within your application code that allows users to set light values and make an API request. This function could call external services or APIs. Python # Define a function that the model can call to control smart lights set_light_values_declaration = { "name" : "set_light_values" , "description" : "Sets the brightness and color temperature of a light." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "integer" , "description" : "Light level from 0 to 100. Zero is off and 100 is full brightness" , }, "color_temp" : { "type" : "string" , "enum" : [ "daylight" , "cool" , "warm" ], "description" : "Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`." , }, }, "required" : [ "brightness" , "color_temp" ], }, } # This is the actual function that would be called based on the model's suggestion def set_light_values ( brightness : int , color_temp : str ) - > dict [ str , int | str ]: """Set the brightness and color temperature of a room light. (mock API). Args: brightness: Light level from 0 to 100. Zero is off and 100 is full brightness color_temp: Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. Returns: A dictionary containing the set brightness and color temperature. """ return { "brightness" : brightness , "colorTemperature" : color_temp } JavaScript import { Type } from '@google/genai' ; // Define a function that the model can call to control smart lights const setLightValuesFunctionDeclaration = { name : 'set_light_values' , description : 'Sets the brightness and color temperature of a light.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'Light level from 0 to 100. Zero is off and 100 is full brightness' , }, color_temp : { type : Type . STRING , enum : [ 'daylight' , 'cool' , 'warm' ], description : 'Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`.' , }, }, required : [ 'brightness' , 'color_temp' ], }, }; /** * Set the brightness and color temperature of a room light. (mock API) * @param {number} brightness - Light level from 0 to 100. Zero is off and 100 is full brightness * @param {string} color_temp - Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. * @return {Object} A dictionary containing the set brightness and color temperature. */ function setLightValues ( brightness , color_temp ) { return { brightness : brightness , colorTemperature : color_temp }; } Step 2: Call the model with function declarations Once you have defined your function declarations, you can prompt the model to use them. It analyzes the prompt and function declarations and decides whether to respond directly or to call a function. If a function is called, the response object will contain a function call suggestion. Python from google.genai import types # Configure the client and tools client = genai . Client () tools = types . Tool ( function_declarations = [ set_light_values_declaration ]) config = types . GenerateContentConfig ( tools = [ tools ]) # Define user prompt contents = [ types . Content ( role = "user" , parts = [ types . Part ( text = "Turn the lights down to a romantic level" )] ) ] # Send request with function declarations response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = contents config = config , ) print ( response . candidates [ 0 ] . content . parts [ 0 ] . function_call ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Generation config with function declaration const config = { tools : [{ functionDeclarations : [ setLightValuesFunctionDeclaration ] }] }; // Configure the client const ai = new GoogleGenAI ({}); // Define user prompt const contents = [ { role : 'user' , parts : [{ text : 'Turn the lights down to a romantic level' }] } ]; // Send request with function declarations const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( response . functionCalls [ 0 ]); The model then returns a functionCall object in an OpenAPI compatible schema specifying how to call one or more of the declared functions in order to respond to the user's question. Python id = None args = { 'color_temp' : 'warm' , 'brightness' : 25 } name = 'set_light_values' JavaScript { name : 'set_light_values' , args : { brightness : 25 , color_temp : 'warm' } } Step 3: Execute set_light_values function code Extract the function call details from the model's response, parse the arguments , and execute the set_light_values function. Python # Extract tool call details, it may not be in the first part. tool_call = response . candidates [ 0 ] . content . parts [ 0 ] . function_call if tool_call . name == "set_light_values" : result = set_light_values ( ** tool_call . args ) print ( f "Function execution result: { result } " ) JavaScript // Extract tool call details const tool_call = response . functionCalls [ 0 ] let result ; if ( tool_call . name === 'set_light_values' ) { result = setLightValues ( tool_call . args . brightness , tool_call . args . color_temp ); console . log ( `Function execution result: ${ JSON . stringify ( result ) } ` ); } Step 4: Create user friendly response with function result and call the model again Finally, send the result of the function execution back to the model so it can incorporate this information into its final response to the user. Python # Create a function response part function_response_part = types . Part . from_function_response ( name = tool_call . name , response = { "result" : result }, ) # Append function call and result of the function execution to contents contents . append ( response . candidates [ 0 ] . content ) # Append the content from the model's response. contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Create a function response part const function_response_part = { name : tool_call . name , response : { result } } // Append function call and result of the function execution to contents contents . push ( response . candidates [ 0 ]. content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); // Get the final response from the model const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); This completes the function calling flow. The model successfully used the set_light_values function to perform the request action of the user. Function declarations When you implement function calling in a prompt, you create a tools object, which contains one or more function declarations . You define functions using JSON, specifically with a select subset of the OpenAPI schema format. A single function declaration can include the following parameters: name (string): A unique name for the function ( get_weather_forecast , send_email ). Use descriptive names without spaces or special characters (use underscores or camelCase). description (string): A clear and detailed explanation of the function's purpose and capabilities. This is crucial for the model to understand when to use the function. Be specific and provide examples if helpful ("Finds theaters based on location and optionally movie title which is currently playing in theaters."). parameters (object): Defines the input parameters the function expects. type (string): Specifies the overall data type, such as object . properties (object): Lists individual parameters, each with: type (string): The data type of the parameter, such as string , integer , boolean, array . description (string): A description of the parameter's purpose and format. Provide examples and constraints ("The city and state, e.g., 'San Francisco, CA' or a zip code e.g., '95616'."). enum (array, optional): If the parameter values are from a fixed set, use "enum" to list the allowed values instead of just describing them in the description. This improves accuracy ("enum": ["daylight", "cool", "warm"]). required (array): An array of strings listing the parameter names that are mandatory for the function to operate. Function calling with thinking Enabling "thinking" can improve function call performance by allowing the model to reason through a request before suggesting function calls. However, because the Gemini API is stateless, this reasoning context is lost between turns, which can reduce the quality of function calls as they require multiple turn requests. To preserve this context you can use thought signatures. A thought signature is an encrypted representation of the model's internal thought process that you pass back to the model on subsequent turns. To use thought signatures: Receive the signature: When thinking is enabled, the API response will include a thought_signature field containing an encrypted representation of the model's reasoning. Return the signature: When you send the function's execution result back to the server, include the thought_signature you received. This allows the model to restore its previous thinking context and will likely result in better function calling performance. Receiving signatures from the server Signatures are returned in the part after the model's thinking phase, which typically is a text or function call. Here are some examples of what thought signatures look like returned in each type of part, in response to the request "What's the weather in Lake Tahoe?" using the Get Weather example: Text part [{ "candidates" : [ { "content" : { "parts" : [ { "text" : "Here's what the weather in Lake Tahoe is today" , "thoughtSignature" : "ClcBVKhc7ru7KzUI7SrdUoIdAYLm/+i93aHjfIt4xHyAoO/G70tApxnK2ujBhOhC1PrRy1pkQa88fqFvpHNVd1HDjNLO7mkp6/hFwE+SPPEB3fh0hs4oM8MKhgIBVKhc7uIGvrS7i/T4HpfbnYrluFfWNjZ62gewqe4cVdR/Dlh+zbjtYmDD0gPZ+SuBO7vvHQdzsjePRP+2Y5XddX6LEf/cGGgakq8EhVvw/a6IVzUO6XmpHg2Ag1sl8E9+VFH/lC0R0ZuYdFWligtDuYwp5p5q3o59G0TtWeU2MC1y2MJfE9u/KWd313ldka80/X2W/xF2O/4djMp5G2WKcULfve75zeRCy0mc5iS3SB9mTH0cT6x0vtKjeBx50gcg+CQWtJcRuwTVzz54dmvmK9xvnqA8gKGw3DuaM9wfy5hyY7Qg0z3iyyWdP8T/lbjKim8IEQOk7O1vVwP1Ko7oMYH8JgA1CsoBAVSoXO6v4c5RSyd1cn6EIU0pEFQsjW7rYWPuZdOFq/tsGJT9BCfW7KGkPGwlNSq8jTJFvbcJ/DjtndISQYXwiXd2kGa5JfdS2Kh4zOxCxiWtOk+2nCc3+XQk2nonhO+esGJpkDdbbHZSqRgcUtYKq7q28iPFOQvOFyCiZNB7K86Z/6Hnagu2snSlN/BcTMaFGaWpcCClSUo4foRZn3WbNCoM8rcpD7qEJMp4a5baaSxyyeL1ZTGd2HLpFys/oiW6e3oAnhxuIysCwg==" } ] , "role" : "model" } , "index" : 0 } ] , # Remainder of response... Function call part [{ "candidates" : [ { "content" : { "parts" : [ { "functionCall" : { "name" : "getWeather" , "args" : { "city" : "Lake Tahoe" } } , "thoughtSignature" : "CiwBVKhc7nRyTi3HmggPD9iQiRc261f5jwuMdw3H/itDH0emsb9ZVo3Nwx9p6wpsAVSoXO5i8fDV4jBSBLoaWxB5zUdlGY6aIGp+I0oEnwRRSRQ1LOvrDlojEH8JE8HjiKXALdJrvNPiG+HY3GZEO8pZjEZtc3UoBUh7+SVyjK7Xolu7aRYYeUyzrCapoETWypER1jbrJXnFV23hCosBAVSoXO6oIPNJSmbuEDfGafOhuCSHkpr1yjTp35RXYqmCESzRzWf5+nFXLqncqeFo4ohoxbiYQVpVQbOZF81p8o9zg6xeRE7qMeOv+XN7enXGJ4/s3qNFQpfkSMqRdBITN1VpX7jyfEAjvxBNc7PDfDJZmEPY338ZIY5nFFcmzJSWjVrboFt2sMFv+A==" } ] , "role" : "model" } , "finishReason" : "STOP" , "index" : 0 } ] , # Remainder of response... You can confirm that you received a signature and see what a signature looks like using the following code: # Step 2: Call the model with function declarations # ...Generation config, Configure the client, and Define user prompt (No changes) # Send request with declarations (using a thinking model) response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents ) # See thought signatures for part in response . candidates [ 0 ] . content . parts : if part . thought_signature : print ( "Thought signature:" ) print ( part . thought_signature ) Returning signatures back to the server In order to return signatures back: You should return signatures along with their containing parts back to the server You shouldn't merge a part with a signature with another part which also contains a signature. The signature string is not concatenable You shouldn't merge one part with a signature with another part without a signature. This breaks the correct positioning of the thought represented by the signature. The code will remain the same as in Step 4 of the previous section. But in this case (as indicated in the comment below) you will return signatures to the model along with the result of the function execution so the model can incorporate the thoughts into its final response: Python # Step 4: Create user friendly response with function result and call the model again # ...Create a function response part (No change) # Append thought signatures, function call and result of the function execution to contents function_call_content = response . candidates [ 0 ] . content # Append the model's function call message, which includes thought signatures contents . append ( function_call_content ) contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Step 4: Create user friendly response with function result and call the model again // ...Create a function response part (No change) // Append thought signatures, function call and result of the function execution to contents const function_response_content = response . candidates [ 0 ]. content ; contents . push ( function_response_content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); The following shows what a request returning a thought signature may look like: [{ "contents" : [ { "role" : "user" , "parts" : [ { "text" : "what is the weather in Lake Tahoe?" } ] } , { "parts" : [ { "functionCall" : { "name" : "getWeather" , "args" : { "city" : "Lake Tahoe" } } , "thoughtSignature" : "CiIBVKhc7oDPpCaXyJKKssjqr4g3JNOSgJ/M2V+1THC1icsWCmwBVKhc7pBABbZ+zR3e9234WnWWS6GFXmf8IVwpnzjd5KYd7vyJbn/4vTorWBGayj/vbd9JPaZQjxdAIXhoE5mX/MDsQ7M9N/b0qJjHm39tYIBvS4sIWkMDHqTJqXGLzhhKtrTkfbV3RbaJEkQKmwEBVKhc7qVUgC3hfTXZLo9R3AJzUUIx50NKvJTb9B+UU+LBqgg7Nck1x5OpjWVS2R+SsveprIuYOruk2Y0H53J2OJF8qsxTdIq2si8DGW2V7WK8xyoJH5kbqd7drIw1jLb44b6lx4SMyB0VaULuTBki4d+Ljjg1tJTwR0IYMKqDLDZt9mheINsi0ZxcNjfpnDydRXdWbcSwzmK/wgqJAQFUqFzuKgNVElxs3cbO+xebr2IwcOro84nKTisi0tTp9bICPC9fTUhn3L+rvQWA+d3J1Za8at2bakrqiRj7BTh+CVO9fWQMAEQAs3ni0Z2hfaYG92tOD26E4IoZwyYEoWbfNudpH1fr5tEkyqnEGtWIh7H+XoZQ2DXeiOa+br7Zk88SrNE+trJMCogBAVSoXO5e9fBLg7hnbkmKsrzNLnQtLsQm1gNzjcjEC7nJYklYPp0KI2uGBE1PkM8XNsfllAfHVn7LzHcHNlbQ9pJ7QZTSIeG42goS971r5wNZwxaXwCTphClQh826eqJWo6A/28TtAVQWLhTx5ekbP7qb4nh1UblESZ1saxDQAEo4OKPbDzx5BgqKAQFUqFzuVyjNm5i0wN8hTDnKjfpDroEpPPTs531iFy9BOX+xDCdGHy8D+osFpaoBq6TFekQQbz4hIoUR1YEcP4zI80/cNimEeb9IcFxZTTxiNrbhbbcv0969DSMWhB+ZEqIz4vuw4GLe/xcUvqhlChQwFdgIbdOQHSHpatn5uDlktnP/bi26nKuXIwo0AVSoXO7US22OUH7d1f4abNPI0IyAvhqkPp12rbtWLx9vkOtojE8IP+xCfYtIFuZIzRNZqA==" } ] , "role" : "model" } , { "role" : "user" , "parts" : [ { "functionResponse" : { "name" : "getWeather" , "response" : { "response" : { "stringValue" : "Sunny and hot. 90 degrees Fahrenheit" } } } } ] } ] , # Remainder of request... Learn more about limitations and usage of thought signatures, and about thinking models in general, on the Thinking page. Parallel function calling In addition to single turn function calling, you can also call multiple functions at once. Parallel function calling lets you execute multiple functions at once and is used when the functions are not dependent on each other. This is useful in scenarios like gathering data from multiple independent sources, such as retrieving customer details from different databases or checking inventory levels across various warehouses or performing multiple actions such as converting your apartment into a disco. Python power_disco_ball = { "name" : "power_disco_ball" , "description" : "Powers the spinning disco ball." , "parameters" : { "type" : "object" , "properties" : { "power" : { "type" : "boolean" , "description" : "Whether to turn the disco ball on or off." , } }, "required" : [ "power" ], }, } start_music = { "name" : "start_music" , "description" : "Play some music matching the specified parameters." , "parameters" : { "type" : "object" , "properties" : { "energetic" : { "type" : "boolean" , "description" : "Whether the music is energetic or not." , }, "loud" : { "type" : "boolean" , "description" : "Whether the music is loud or not." , }, }, "required" : [ "energetic" , "loud" ], }, } dim_lights = { "name" : "dim_lights" , "description" : "Dim the lights." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "number" , "description" : "The brightness of the lights, 0.0 is off, 1.0 is full." , } }, "required" : [ "brightness" ], }, } JavaScript import { Type } from '@google/genai' ; const powerDiscoBall = { name : 'power_disco_ball' , description : 'Powers the spinning disco ball.' , parameters : { type : Type . OBJECT , properties : { power : { type : Type . BOOLEAN , description : 'Whether to turn the disco ball on or off.' } }, required : [ 'power' ] } }; const startMusic = { name : 'start_music' , description : 'Play some music matching the specified parameters.' , parameters : { type : Type . OBJECT , properties : { energetic : { type : Type . BOOLEAN , description : 'Whether the music is energetic or not.' }, loud : { type : Type . BOOLEAN , description : 'Whether the music is loud or not.' } }, required : [ 'energetic' , 'loud' ] } }; const dimLights = { name : 'dim_lights' , description : 'Dim the lights.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'The brightness of the lights, 0.0 is off, 1.0 is full.' } }, required : [ 'brightness' ] } }; Configure the function calling mode to allow using all of the specified tools. To learn more, you can read about configuring function calling . Python from google import genai from google.genai import types # Configure the client and tools client = genai . Client () house_tools = [ types . Tool ( function_declarations = [ power_disco_ball , start_music , dim_lights ]) ] config = types . GenerateContentConfig ( tools = house_tools , automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ), # Force the model to call 'any' function, instead of chatting. tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = 'ANY' ) ), ) chat = client . chats . create ( model = "gemini-2.5-flash" , config = config ) response = chat . send_message ( "Turn this place into a party!" ) # Print out each of the function calls requested from this single call print ( "Example 1: Forced function calling" ) for fn in response . function_calls : args = ", " . join ( f " { key } = { val } " for key , val in fn . args . items ()) print ( f " { fn . name } ( { args } )" ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Set up function declarations const houseFns = [ powerDiscoBall , startMusic , dimLights ]; const config = { tools : [{ functionDeclarations : houseFns }], // Force the model to call 'any' function, instead of chatting. toolConfig : { functionCallingConfig : { mode : 'any' } } }; // Configure the client const ai = new GoogleGenAI ({}); // Create a chat session const chat = ai . chats . create ({ model : 'gemini-2.5-flash' , config : config }); const response = await chat . sendMessage ({ message : 'Turn this place into a party!' }); // Print out each of the function calls requested from this single call console . log ( "Example 1: Forced function calling" ); for ( const fn of response . functionCalls ) { const args = Object . entries ( fn . args ) . map (([ key , val ]) = > ` ${ key } = ${ val } ` ) . join ( ', ' ); console . log ( ` ${ fn . name } ( ${ args } )` ); } Each of the printed results reflects a single function call that the model has requested. To send the results back, include the responses in the same order as they were requested. The Python SDK supports automatic function calling , which automatically converts Python functions to declarations, handles the function call execution and response cycle for you. Following is an example for the disco use case. Note: Automatic Function Calling is a Python SDK only feature at the moment. Python from google import genai from google.genai import types # Actual function implementations def power_disco_ball_impl ( power : bool ) - > dict : """Powers the spinning disco ball. Args: power: Whether to turn the disco ball on or off. Returns: A status dictionary indicating the current state. """ return { "status" : f "Disco ball powered { 'on' if power else 'off' } " } def start_music_impl ( energetic : bool , loud : bool ) - > dict : """Play some music matching the specified parameters. Args: energetic: Whether the music is energetic or not. loud: Whether the music is loud or not. Returns: A dictionary containing the music settings. """ music_type = "energetic" if energetic else "chill" volume = "loud" if loud else "quiet" return { "music_type" : music_type , "volume" : volume } def dim_lights_impl ( brightness : float ) - > dict : """Dim the lights. Args: brightness: The brightness of the lights, 0.0 is off, 1.0 is full. Returns: A dictionary containing the new brightness setting. """ return { "brightness" : brightness } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ power_disco_ball_impl , start_music_impl , dim_lights_impl ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Do everything you need to this place into party!" , config = config , ) print ( " \n Example 2: Automatic function calling" ) print ( response . text ) # I've turned on the disco ball, started playing loud and energetic music, and dimmed the lights to 50% brightness. Let's get this party started! Compositional function calling Compositional or sequential function calling allows Gemini to chain multiple function calls together to fulfill a complex request. For example, to answer "Get the temperature in my current location", the Gemini API might first invoke a get_current_location() function followed by a get_weather() function that takes the location as a parameter. The following example demonstrates how to implement compositional function calling using the Python SDK and automatic function calling. Python This example uses the automatic function calling feature of the google-genai Python SDK. The SDK automatically converts the Python functions to the required schema, executes the function calls when requested by the model, and sends the results back to the model to complete the task. import os from google import genai from google.genai import types # Example Functions def get_weather_forecast ( location : str ) - > dict : """Gets the current weather temperature for a given location.""" print ( f "Tool Call: get_weather_forecast(location= { location } )" ) # TODO: Make API call print ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ) return { "temperature" : 25 , "unit" : "celsius" } # Dummy response def set_thermostat_temperature ( temperature : int ) - > dict : """Sets the thermostat to a desired temperature.""" print ( f "Tool Call: set_thermostat_temperature(temperature= { temperature } )" ) # TODO: Interact with a thermostat API print ( "Tool Response: {'status': 'success'}" ) return { "status" : "success" } # Configure the client and model client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_weather_forecast , set_thermostat_temperature ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , config = config , ) # Print the final, user-facing response print ( response . text ) Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . I 've set the thermostat to 20°C. JavaScript This example shows how to use JavaScript/TypeScript SDK to do comopositional function calling using a manual execution loop. import { GoogleGenAI , Type } from "@google/genai" ; // Configure the client const ai = new GoogleGenAI ({}); // Example Functions function get_weather_forecast ({ location }) { console . log ( `Tool Call: get_weather_forecast(location= ${ location } )` ); // TODO: Make API call console . log ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ); return { temperature : 25 , unit : "celsius" }; } function set_thermostat_temperature ({ temperature }) { console . log ( `Tool Call: set_thermostat_temperature(temperature= ${ temperature } )` , ); // TODO: Make API call console . log ( "Tool Response: {'status': 'success'}" ); return { status : "success" }; } const toolFunctions = { get_weather_forecast , set_thermostat_temperature , }; const tools = [ { functionDeclarations : [ { name : "get_weather_forecast" , description : "Gets the current weather temperature for a given location." , parameters : { type : Type . OBJECT , properties : { location : { type : Type . STRING , }, }, required : [ "location" ], }, }, { name : "set_thermostat_temperature" , description : "Sets the thermostat to a desired temperature." , parameters : { type : Type . OBJECT , properties : { temperature : { type : Type . NUMBER , }, }, required : [ "temperature" ], }, }, ], }, ]; // Prompt for the model let contents = [ { role : "user" , parts : [ { text : "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , }, ], }, ]; // Loop until the model has no more function calls to make while ( true ) { const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents , config : { tools }, }); if ( result . functionCalls && result . functionCalls . length > 0 ) { const functionCall = result . functionCalls [ 0 ]; const { name , args } = functionCall ; if ( ! toolFunctions [ name ]) { throw new Error ( `Unknown function call: ${ name } ` ); } // Call the function and get the response. const toolResponse = toolFunctions [ name ]( args ); const functionResponsePart = { name : functionCall . name , response : { result : toolResponse , }, }; // Send the function response back to the model. contents . push ({ role : "model" , parts : [ { functionCall : functionCall , }, ], }); contents . push ({ role : "user" , parts : [ { functionResponse : functionResponsePart , }, ], }); } else { // No more function calls, break the loop. console . log ( result . text ); break ; } } Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . It 's 25°C in London, so I' ve set the thermostat to 20 ° C . Compositional function calling is a native Live API feature. This means Live API can handle the function calling similar to the Python SDK. Python # Light control schemas turn_on_the_lights_schema = { 'name' : 'turn_on_the_lights' } turn_off_the_lights_schema = { 'name' : 'turn_off_the_lights' } prompt = """ Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? """ tools = [ { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} ] await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Light control schemas const turnOnTheLightsSchema = { name : 'turn_on_the_lights' }; const turnOffTheLightsSchema = { name : 'turn_off_the_lights' }; const prompt = ` Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? ` ; const tools = [ { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } ]; await run ( prompt , tools = tools , modality = "AUDIO" ) Function calling modes The Gemini API lets you control how the model uses the provided tools (function declarations). Specifically, you can set the mode within the. function_calling_config . AUTO (Default) : The model decides whether to generate a natural language response or suggest a function call based on the prompt and context. This is the most flexible mode and recommended for most scenarios. ANY : The model is constrained to always predict a function call and guarantees function schema adherence. If allowed_function_names is not specified, the model can choose from any of the provided function declarations. If allowed_function_names is provided as a list, the model can only choose from the functions in that list. Use this mode when you require a function call response to every prompt (if applicable). NONE : The model is prohibited from making function calls. This is equivalent to sending a request without any function declarations. Use this to temporarily disable function calling without removing your tool definitions. Python from google.genai import types # Configure function calling mode tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = "ANY" , allowed_function_names = [ "get_current_temperature" ] ) ) # Create the generation config config = types . GenerateContentConfig ( tools = [ tools ], # not defined here. tool_config = tool_config , ) JavaScript import { FunctionCallingConfigMode } from '@google/genai' ; // Configure function calling mode const toolConfig = { functionCallingConfig : { mode : FunctionCallingConfigMode . ANY , allowedFunctionNames : [ 'get_current_temperature' ] } }; // Create the generation config const config = { tools : tools , // not defined here. toolConfig : toolConfig , }; Automatic function calling (Python only) When using the Python SDK, you can provide Python functions directly as tools. The SDK automatically converts the Python function to declarations, handles the function call execution and the response cycle for you. The Python SDK then automatically: Detects function call responses from the model. Call the corresponding Python function in your code. Sends the function response back to the model. Returns the model's final text response. To use this, define your function with type hints and a docstring, and then pass the function itself (not a JSON declaration) as a tool: Python from google import genai from google.genai import types # Define the function with type hints and docstring def get_current_temperature ( location : str ) - > dict : """Gets the current temperature for a given location. Args: location: The city and state, e.g. San Francisco, CA Returns: A dictionary containing the temperature and unit. """ # ... (implementation) ... return { "temperature" : 25 , "unit" : "Celsius" } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_current_temperature ] ) # Pass the function itself # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "What's the temperature in Boston?" , config = config , ) print ( response . text ) # The SDK handles the function call and returns the final text You can disable automatic function calling with: Python config = types . GenerateContentConfig ( tools = [ get_current_temperature ], automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ) ) Automatic function schema declaration Automatic schema extraction from Python functions doesn't work in all cases. For example, it doesn't handle cases where you describe the fields of a nested dictionary-object. The API is able to describe any of the following types: Python AllowedType = ( int | float | bool | str | list [ 'AllowedType' ] | dict [ str , AllowedType ]) To see what the inferred schema looks like, you can convert it using from_callable : Python def multiply ( a : float , b : float ): """Returns a * b.""" return a * b fn_decl = types . FunctionDeclaration . from_callable ( callable = multiply , client = client ) # to_json_dict() provides a clean JSON representation. print ( fn_decl . to_json_dict ()) Multi-tool use: Combine native tools with function calling You can enable multiple tools combining native tools with function calling at the same time. Here's an example that enables two tools, Grounding with Google Search and code execution , in a request using the Live API . Note: Multi-tool use is a- Live API only feature at the moment. The run() function declaration, which handles the asynchronous websocket setup, is omitted for brevity. Python # Multiple tasks example - combining lights, code execution, and search prompt = """ Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! """ tools = [ { 'google_search' : {}}, { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} # not defined here. ] # Execute the prompt with specified tools in audio modality await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Multiple tasks example - combining lights, code execution, and search const prompt = ` Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! ` ; const tools = [ { googleSearch : {} }, { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } // not defined here. ]; // Execute the prompt with specified tools in audio modality await run ( prompt , { tools : tools , modality : "AUDIO" }); Python developers can try this out in the Live API Tool Use notebook . Model context protocol (MCP) Model Context Protocol (MCP) is an open standard for connecting AI applications with external tools and data. MCP provides a common protocol for models to access context, such as functions (tools), data sources (resources), or predefined prompts. The Gemini SDKs have built-in support for the MCP, reducing boilerplate code and offering automatic tool calling for MCP tools. When the model generates an MCP tool call, the Python and JavaScript client SDK can automatically execute the MCP tool and send the response back to the model in a subsequent request, continuing this loop until no more tool calls are made by the model. Here, you can find an example of how to use a local MCP server with Gemini and mcp SDK. Python Make sure the latest version of the mcp SDK is installed on your platform of choice. pip install mcp Note: Python supports automatic tool calling by passing in the ClientSession into the tools parameters. If you want to disable it, you can provide automatic_function_calling with disabled True . import os import asyncio from datetime import datetime from mcp import ClientSession , StdioServerParameters from mcp.client.stdio import stdio_client from google import genai client = genai . Client () # Create server parameters for stdio connection server_params = StdioServerParameters ( command = "npx" , # Executable args = [ "-y" , "@philschmid/weather-mcp" ], # MCP Server env = None , # Optional environment variables ) async def run (): async with stdio_client ( server_params ) as ( read , write ): async with ClientSession ( read , write ) as session : # Prompt to get the weather for the current day in London. prompt = f "What is the weather in London in { datetime . now () . strftime ( '%Y-%m- %d ' ) } ?" # Initialize the connection between client and server await session . initialize () # Send request to the model with MCP function declarations response = await client . aio . models . generate_content ( model = "gemini-2.5-flash" , contents = prompt , config = genai . types . GenerateContentConfig ( temperature = 0 , tools = [ session ], # uses the session, will automatically call the tool # Uncomment if you **don't** want the SDK to automatically call the tool # automatic_function_calling=genai.types.AutomaticFunctionCallingConfig( # disable=True # ), ), ) print ( response . text ) # Start the asyncio event loop and run the main function asyncio . run ( run ()) JavaScript Make sure the latest version of the mcp SDK is installed on your platform of choice. npm install @modelcontextprotocol/sdk Note: JavaScript supports automatic tool calling by wrapping the client with mcpToTool . If you want to disable it, you can provide automaticFunctionCalling with disabled true . import { GoogleGenAI , FunctionCallingConfigMode , mcpToTool } from '@google/genai' ; import { Client } from "@modelcontextprotocol/sdk/client/index.js" ; import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js" ; // Create server parameters for stdio connection const serverParams = new StdioClientTransport ({ command : "npx" , // Executable args : [ "-y" , "@philschmid/weather-mcp" ] // MCP Server }); const client = new Client ( { name : "example-client" , version : "1.0.0" } ); // Configure the client const ai = new GoogleGenAI ({}); // Initialize the connection between client and server await client . connect ( serverParams ); // Send request to the model with MCP tools const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : `What is the weather in London in ${ new Date (). toLocaleDateString () } ?` , config : { tools : [ mcpToTool ( client )], // uses the session, will automatically call the tool // Uncomment if you **don't** want the sdk to automatically call the tool // automaticFunctionCalling: { // disable: true, // }, }, }); console . log ( response . text ) // Close the connection await client . close (); Limitations with built-in MCP support Built-in MCP support is a experimental feature in our SDKs and has the following limitations: Only tools are supported, not resources nor prompts It is available for the Python and JavaScript/TypeScript SDK. Breaking changes might occur in future releases. Manual integration of MCP servers is always an option if these limit what you're building. Supported models This section lists models and their function calling capabilities. Experimental models are not included. You can find a comprehensive capabilities overview on the model overview page. Model Function Calling Parallel Function Calling Compositional Function Calling Gemini 2.5 Pro ✔️ ✔️ ✔️ Gemini 2.5 Flash ✔️ ✔️ ✔️ Gemini 2.5 Flash-Lite ✔️ ✔️ ✔️ Gemini 2.0 Flash ✔️ ✔️ ✔️ Gemini 2.0 Flash-Lite X X X Best practices Function and Parameter Descriptions: Be extremely clear and specific in your descriptions. The model relies on these to choose the correct function and provide appropriate arguments. Naming: Use descriptive function names (without spaces, periods, or dashes). Strong Typing: Use specific types (integer, string, enum) for parameters to reduce errors. If a parameter has a limited set of valid values, use an enum. Tool Selection: While the model can use an arbitrary number of tools, providing too many can increase the risk of selecting an incorrect or suboptimal tool. For best results, aim to provide only the relevant tools for the context or task, ideally keeping the active set to a maximum of 10-20. Consider dynamic tool selection based on conversation context if you have a large total number of tools. Prompt Engineering: Provide context: Tell the model its role (e.g., "You are a helpful weather assistant."). Give instructions: Specify how and when to use functions (e.g., "Don't guess dates; always use a future date for forecasts."). Encourage clarification: Instruct the model to ask clarifying questions if needed. Temperature: Use a low temperature (e.g., 0) for more deterministic and reliable function calls. Validation: If a function call has significant consequences (e.g., placing an order), validate the call with the user before executing it. Error Handling : Implement robust error handling in your functions to gracefully handle unexpected inputs or API failures. Return informative error messages that the model can use to generate helpful responses to the user. Security: Be mindful of security when calling external APIs. Use appropriate authentication and authorization mechanisms. Avoid exposing sensitive data in function calls. Token Limits: Function descriptions and parameters count towards your input token limit. If you're hitting token limits, consider limiting the number of functions or the length of the descriptions, break down complex tasks into smaller, more focused function sets. Notes and limitations Only a subset of the OpenAPI schema is supported. Supported parameter types in Python are limited. Automatic function calling is a Python SDK feature only. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-10 UTC.
 
2
  Title: Function calling with the Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Function calling with the Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাং��া ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Function calling with the Gemini API Function calling lets you connect models to external tools and APIs. Instead of generating text responses, the model determines when to call specific functions and provides the necessary parameters to execute real-world actions. This allows the model to act as a bridge between natural language and real-world actions and data. Function calling has 3 primary use cases: Augment Knowledge: Access information from external sources like databases, APIs, and knowledge bases. Extend Capabilities: Use external tools to perform computations and extend the limitations of the model, such as using a calculator or creating charts. Take Actions: Interact with external systems using APIs, such as scheduling appointments, creating invoices, sending emails, or controlling smart home devices. Get Weather Schedule Meeting Create Chart How function calling works Function calling involves a structured interaction between your application, the model, and external functions. Here's a breakdown of the process: Define Function Declaration: Define the function declaration in your application code. Function Declarations describe the function's name, parameters, and purpose to the model. Call LLM with function declarations: Send user prompt along with the function declaration(s) to the model. It analyzes the request and determines if a function call would be helpful. If so, it responds with a structured JSON object. Execute Function Code (Your Responsibility): The Model does not execute the function itself. It's your application's responsibility to process the response and check for Function Call, if Yes : Extract the name and args of the function and execute the corresponding function in your application. No: The model has provided a direct text response to the prompt (this flow is less emphasized in the example but is a possible outcome). Create User friendly response: If a function was executed, capture the result and send it back to the model in a subsequent turn of the conversation. It will use the result to generate a final, user-friendly response that incorporates the information from the function call. This process can be repeated over multiple turns, allowing for complex interactions and workflows. The model also supports calling multiple functions in a single turn ( parallel function calling ) and in sequence ( compositional function calling ). Step 1: Define a function declaration Define a function and its declaration within your application code that allows users to set light values and make an API request. This function could call external services or APIs. Python # Define a function that the model can call to control smart lights set_light_values_declaration = { "name" : "set_light_values" , "description" : "Sets the brightness and color temperature of a light." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "integer" , "description" : "Light level from 0 to 100. Zero is off and 100 is full brightness" , }, "color_temp" : { "type" : "string" , "enum" : [ "daylight" , "cool" , "warm" ], "description" : "Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`." , }, }, "required" : [ "brightness" , "color_temp" ], }, } # This is the actual function that would be called based on the model's suggestion def set_light_values ( brightness : int , color_temp : str ) - > dict [ str , int | str ]: """Set the brightness and color temperature of a room light. (mock API). Args: brightness: Light level from 0 to 100. Zero is off and 100 is full brightness color_temp: Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. Returns: A dictionary containing the set brightness and color temperature. """ return { "brightness" : brightness , "colorTemperature" : color_temp } JavaScript import { Type } from '@google/genai' ; // Define a function that the model can call to control smart lights const setLightValuesFunctionDeclaration = { name : 'set_light_values' , description : 'Sets the brightness and color temperature of a light.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'Light level from 0 to 100. Zero is off and 100 is full brightness' , }, color_temp : { type : Type . STRING , enum : [ 'daylight' , 'cool' , 'warm' ], description : 'Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`.' , }, }, required : [ 'brightness' , 'color_temp' ], }, }; /** * Set the brightness and color temperature of a room light. (mock API) * @param {number} brightness - Light level from 0 to 100. Zero is off and 100 is full brightness * @param {string} color_temp - Color temperature of the light fixture, which can be `daylight`, `cool` or `warm`. * @return {Object} A dictionary containing the set brightness and color temperature. */ function setLightValues ( brightness , color_temp ) { return { brightness : brightness , colorTemperature : color_temp }; } Step 2: Call the model with function declarations Once you have defined your function declarations, you can prompt the model to use them. It analyzes the prompt and function declarations and decides whether to respond directly or to call a function. If a function is called, the response object will contain a function call suggestion. Python from google.genai import types # Configure the client and tools client = genai . Client () tools = types . Tool ( function_declarations = [ set_light_values_declaration ]) config = types . GenerateContentConfig ( tools = [ tools ]) # Define user prompt contents = [ types . Content ( role = "user" , parts = [ types . Part ( text = "Turn the lights down to a romantic level" )] ) ] # Send request with function declarations response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = contents config = config , ) print ( response . candidates [ 0 ] . content . parts [ 0 ] . function_call ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Generation config with function declaration const config = { tools : [{ functionDeclarations : [ setLightValuesFunctionDeclaration ] }] }; // Configure the client const ai = new GoogleGenAI ({}); // Define user prompt const contents = [ { role : 'user' , parts : [{ text : 'Turn the lights down to a romantic level' }] } ]; // Send request with function declarations const response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( response . functionCalls [ 0 ]); The model then returns a functionCall object in an OpenAPI compatible schema specifying how to call one or more of the declared functions in order to respond to the user's question. Python id = None args = { 'color_temp' : 'warm' , 'brightness' : 25 } name = 'set_light_values' JavaScript { name : 'set_light_values' , args : { brightness : 25 , color_temp : 'warm' } } Step 3: Execute set_light_values function code Extract the function call details from the model's response, parse the arguments , and execute the set_light_values function. Python # Extract tool call details, it may not be in the first part. tool_call = response . candidates [ 0 ] . content . parts [ 0 ] . function_call if tool_call . name == "set_light_values" : result = set_light_values ( ** tool_call . args ) print ( f "Function execution result: { result } " ) JavaScript // Extract tool call details const tool_call = response . functionCalls [ 0 ] let result ; if ( tool_call . name === 'set_light_values' ) { result = setLightValues ( tool_call . args . brightness , tool_call . args . color_temp ); console . log ( `Function execution result: ${ JSON . stringify ( result ) } ` ); } Step 4: Create user friendly response with function result and call the model again Finally, send the result of the function execution back to the model so it can incorporate this information into its final response to the user. Python # Create a function response part function_response_part = types . Part . from_function_response ( name = tool_call . name , response = { "result" : result }, ) # Append function call and result of the function execution to contents contents . append ( response . candidates [ 0 ] . content ) # Append the content from the model's response. contents . append ( types . Content ( role = "user" , parts = [ function_response_part ])) # Append the function response final_response = client . models . generate_content ( model = "gemini-2.5-flash" , config = config , contents = contents , ) print ( final_response . text ) JavaScript // Create a function response part const function_response_part = { name : tool_call . name , response : { result } } // Append function call and result of the function execution to contents contents . push ( response . candidates [ 0 ]. content ); contents . push ({ role : 'user' , parts : [{ functionResponse : function_response_part }] }); // Get the final response from the model const final_response = await ai . models . generateContent ({ model : 'gemini-2.5-flash' , contents : contents , config : config }); console . log ( final_response . text ); This completes the function calling flow. The model successfully used the set_light_values function to perform the request action of the user. Function declarations When you implement function calling in a prompt, you create a tools object, which contains one or more function declarations . You define functions using JSON, specifically with a select subset of the OpenAPI schema format. A single function declaration can include the following parameters: name (string): A unique name for the function ( get_weather_forecast , send_email ). Use descriptive names without spaces or special characters (use underscores or camelCase). description (string): A clear and detailed explanation of the function's purpose and capabilities. This is crucial for the model to understand when to use the function. Be specific and provide examples if helpful ("Finds theaters based on location and optionally movie title which is currently playing in theaters."). parameters (object): Defines the input parameters the function expects. type (string): Specifies the overall data type, such as object . properties (object): Lists individual parameters, each with: type (string): The data type of the parameter, such as string , integer , boolean, array . description (string): A description of the parameter's purpose and format. Provide examples and constraints ("The city and state, e.g., 'San Francisco, CA' or a zip code e.g., '95616'."). enum (array, optional): If the parameter values are from a fixed set, use "enum" to list the allowed values instead of just describing them in the description. This improves accuracy ("enum": ["daylight", "cool", "warm"]). required (array): An array of strings listing the parameter names that are mandatory for the function to operate. You can also construct FunctionDeclarations from Python functions directly using types.FunctionDeclaration.from_callable(client=client, callable=your_function) . Function calling with thinking Enabling " thinking " can improve function call performance by allowing the model to reason through a request before suggesting function calls. The Gemini API is stateless, the model's reasoning context will be lost between turns in a multi-turn conversation. To preserve this context, you can use thought signatures. A thought signature is an encrypted representation of the model's internal thought process that you pass back to the model on subsequent turns. The standard pattern for multi-turn tool use is to append the model's complete previous response to the conversation history. The content object includes the thought_signatures automatically. If you follow this pattern No code changes are required . Manually managing thought signatures If you modify the conversation history manually—instead of sending the complete previous response and want to benefit from thinking you must correctly handle the thought_signature included in the model's turn. Follow these rules to ensure the model's context is preserved: Always send the thought_signature back to the model inside its original Part . Don't merge a Part containing a signature with one that does not. This breaks the positional context of the thought. Don't combine two Parts that both contain signatures, as the signature strings cannot be merged. Inspecting Thought Signatures While not necessary for implementation, you can inspect the response to see the thought_signature for debugging or educational purposes. Python import base64 # After receiving a response from a model with thinking enabled # response = client.models.generate_content(...) # The signature is attached to the response part containing the function call part = response . candidates [ 0 ] . content . parts [ 0 ] if part . thought_signature : print ( base64 . b64encode ( part . thought_signature ) . decode ( "utf-8" )) JavaScript // After receiving a response from a model with thinking enabled // const response = await ai.models.generateContent(...) // The signature is attached to the response part containing the function call const part = response . candidates [ 0 ]. content . parts [ 0 ]; if ( part . thoughtSignature ) { console . log ( part . thoughtSignature ); } Learn more about limitations and usage of thought signatures, and about thinking models in general, on the Thinking page. Parallel function calling In addition to single turn function calling, you can also call multiple functions at once. Parallel function calling lets you execute multiple functions at once and is used when the functions are not dependent on each other. This is useful in scenarios like gathering data from multiple independent sources, such as retrieving customer details from different databases or checking inventory levels across various warehouses or performing multiple actions such as converting your apartment into a disco. Python power_disco_ball = { "name" : "power_disco_ball" , "description" : "Powers the spinning disco ball." , "parameters" : { "type" : "object" , "properties" : { "power" : { "type" : "boolean" , "description" : "Whether to turn the disco ball on or off." , } }, "required" : [ "power" ], }, } start_music = { "name" : "start_music" , "description" : "Play some music matching the specified parameters." , "parameters" : { "type" : "object" , "properties" : { "energetic" : { "type" : "boolean" , "description" : "Whether the music is energetic or not." , }, "loud" : { "type" : "boolean" , "description" : "Whether the music is loud or not." , }, }, "required" : [ "energetic" , "loud" ], }, } dim_lights = { "name" : "dim_lights" , "description" : "Dim the lights." , "parameters" : { "type" : "object" , "properties" : { "brightness" : { "type" : "number" , "description" : "The brightness of the lights, 0.0 is off, 1.0 is full." , } }, "required" : [ "brightness" ], }, } JavaScript import { Type } from '@google/genai' ; const powerDiscoBall = { name : 'power_disco_ball' , description : 'Powers the spinning disco ball.' , parameters : { type : Type . OBJECT , properties : { power : { type : Type . BOOLEAN , description : 'Whether to turn the disco ball on or off.' } }, required : [ 'power' ] } }; const startMusic = { name : 'start_music' , description : 'Play some music matching the specified parameters.' , parameters : { type : Type . OBJECT , properties : { energetic : { type : Type . BOOLEAN , description : 'Whether the music is energetic or not.' }, loud : { type : Type . BOOLEAN , description : 'Whether the music is loud or not.' } }, required : [ 'energetic' , 'loud' ] } }; const dimLights = { name : 'dim_lights' , description : 'Dim the lights.' , parameters : { type : Type . OBJECT , properties : { brightness : { type : Type . NUMBER , description : 'The brightness of the lights, 0.0 is off, 1.0 is full.' } }, required : [ 'brightness' ] } }; Configure the function calling mode to allow using all of the specified tools. To learn more, you can read about configuring function calling . Python from google import genai from google.genai import types # Configure the client and tools client = genai . Client () house_tools = [ types . Tool ( function_declarations = [ power_disco_ball , start_music , dim_lights ]) ] config = types . GenerateContentConfig ( tools = house_tools , automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ), # Force the model to call 'any' function, instead of chatting. tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = 'ANY' ) ), ) chat = client . chats . create ( model = "gemini-2.5-flash" , config = config ) response = chat . send_message ( "Turn this place into a party!" ) # Print out each of the function calls requested from this single call print ( "Example 1: Forced function calling" ) for fn in response . function_calls : args = ", " . join ( f " { key } = { val } " for key , val in fn . args . items ()) print ( f " { fn . name } ( { args } )" ) JavaScript import { GoogleGenAI } from '@google/genai' ; // Set up function declarations const houseFns = [ powerDiscoBall , startMusic , dimLights ]; const config = { tools : [{ functionDeclarations : houseFns }], // Force the model to call 'any' function, instead of chatting. toolConfig : { functionCallingConfig : { mode : 'any' } } }; // Configure the client const ai = new GoogleGenAI ({}); // Create a chat session const chat = ai . chats . create ({ model : 'gemini-2.5-flash' , config : config }); const response = await chat . sendMessage ({ message : 'Turn this place into a party!' }); // Print out each of the function calls requested from this single call console . log ( "Example 1: Forced function calling" ); for ( const fn of response . functionCalls ) { const args = Object . entries ( fn . args ) . map (([ key , val ]) = > ` ${ key } = ${ val } ` ) . join ( ', ' ); console . log ( ` ${ fn . name } ( ${ args } )` ); } Each of the printed results reflects a single function call that the model has requested. To send the results back, include the responses in the same order as they were requested. The Python SDK supports automatic function calling , which automatically converts Python functions to declarations, handles the function call execution and response cycle for you. Following is an example for the disco use case. Note: Automatic Function Calling is a Python SDK only feature at the moment. Python from google import genai from google.genai import types # Actual function implementations def power_disco_ball_impl ( power : bool ) - > dict : """Powers the spinning disco ball. Args: power: Whether to turn the disco ball on or off. Returns: A status dictionary indicating the current state. """ return { "status" : f "Disco ball powered { 'on' if power else 'off' } " } def start_music_impl ( energetic : bool , loud : bool ) - > dict : """Play some music matching the specified parameters. Args: energetic: Whether the music is energetic or not. loud: Whether the music is loud or not. Returns: A dictionary containing the music settings. """ music_type = "energetic" if energetic else "chill" volume = "loud" if loud else "quiet" return { "music_type" : music_type , "volume" : volume } def dim_lights_impl ( brightness : float ) - > dict : """Dim the lights. Args: brightness: The brightness of the lights, 0.0 is off, 1.0 is full. Returns: A dictionary containing the new brightness setting. """ return { "brightness" : brightness } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ power_disco_ball_impl , start_music_impl , dim_lights_impl ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Do everything you need to this place into party!" , config = config , ) print ( " \n Example 2: Automatic function calling" ) print ( response . text ) # I've turned on the disco ball, started playing loud and energetic music, and dimmed the lights to 50% brightness. Let's get this party started! Compositional function calling Compositional or sequential function calling allows Gemini to chain multiple function calls together to fulfill a complex request. For example, to answer "Get the temperature in my current location", the Gemini API might first invoke a get_current_location() function followed by a get_weather() function that takes the location as a parameter. The following example demonstrates how to implement compositional function calling using the Python SDK and automatic function calling. Python This example uses the automatic function calling feature of the google-genai Python SDK. The SDK automatically converts the Python functions to the required schema, executes the function calls when requested by the model, and sends the results back to the model to complete the task. import os from google import genai from google.genai import types # Example Functions def get_weather_forecast ( location : str ) - > dict : """Gets the current weather temperature for a given location.""" print ( f "Tool Call: get_weather_forecast(location= { location } )" ) # TODO: Make API call print ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ) return { "temperature" : 25 , "unit" : "celsius" } # Dummy response def set_thermostat_temperature ( temperature : int ) - > dict : """Sets the thermostat to a desired temperature.""" print ( f "Tool Call: set_thermostat_temperature(temperature= { temperature } )" ) # TODO: Interact with a thermostat API print ( "Tool Response: {'status': 'success'}" ) return { "status" : "success" } # Configure the client and model client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_weather_forecast , set_thermostat_temperature ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , config = config , ) # Print the final, user-facing response print ( response . text ) Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . I 've set the thermostat to 20°C. JavaScript This example shows how to use JavaScript/TypeScript SDK to do comopositional function calling using a manual execution loop. import { GoogleGenAI , Type } from "@google/genai" ; // Configure the client const ai = new GoogleGenAI ({}); // Example Functions function get_weather_forecast ({ location }) { console . log ( `Tool Call: get_weather_forecast(location= ${ location } )` ); // TODO: Make API call console . log ( "Tool Response: {'temperature': 25, 'unit': 'celsius'}" ); return { temperature : 25 , unit : "celsius" }; } function set_thermostat_temperature ({ temperature }) { console . log ( `Tool Call: set_thermostat_temperature(temperature= ${ temperature } )` , ); // TODO: Make API call console . log ( "Tool Response: {'status': 'success'}" ); return { status : "success" }; } const toolFunctions = { get_weather_forecast , set_thermostat_temperature , }; const tools = [ { functionDeclarations : [ { name : "get_weather_forecast" , description : "Gets the current weather temperature for a given location." , parameters : { type : Type . OBJECT , properties : { location : { type : Type . STRING , }, }, required : [ "location" ], }, }, { name : "set_thermostat_temperature" , description : "Sets the thermostat to a desired temperature." , parameters : { type : Type . OBJECT , properties : { temperature : { type : Type . NUMBER , }, }, required : [ "temperature" ], }, }, ], }, ]; // Prompt for the model let contents = [ { role : "user" , parts : [ { text : "If it's warmer than 20°C in London, set the thermostat to 20°C, otherwise set it to 18°C." , }, ], }, ]; // Loop until the model has no more function calls to make while ( true ) { const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents , config : { tools }, }); if ( result . functionCalls && result . functionCalls . length > 0 ) { const functionCall = result . functionCalls [ 0 ]; const { name , args } = functionCall ; if ( ! toolFunctions [ name ]) { throw new Error ( `Unknown function call: ${ name } ` ); } // Call the function and get the response. const toolResponse = toolFunctions [ name ]( args ); const functionResponsePart = { name : functionCall . name , response : { result : toolResponse , }, }; // Send the function response back to the model. contents . push ({ role : "model" , parts : [ { functionCall : functionCall , }, ], }); contents . push ({ role : "user" , parts : [ { functionResponse : functionResponsePart , }, ], }); } else { // No more function calls, break the loop. console . log ( result . text ); break ; } } Expected Output When you run the code, you will see the SDK orchestrating the function calls. The model first calls get_weather_forecast , receives the temperature, and then calls set_thermostat_temperature with the correct value based on the logic in the prompt. Tool Call : get_weather_forecast ( location = London ) Tool Response : { 'temperature' : 25 , 'unit' : 'celsius' } Tool Call : set_thermostat_temperature ( temperature = 20 ) Tool Response : { 'status' : 'success' } OK . It 's 25°C in London, so I' ve set the thermostat to 20 ° C . Compositional function calling is a native Live API feature. This means Live API can handle the function calling similar to the Python SDK. Python # Light control schemas turn_on_the_lights_schema = { 'name' : 'turn_on_the_lights' } turn_off_the_lights_schema = { 'name' : 'turn_off_the_lights' } prompt = """ Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? """ tools = [ { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} ] await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Light control schemas const turnOnTheLightsSchema = { name : 'turn_on_the_lights' }; const turnOffTheLightsSchema = { name : 'turn_off_the_lights' }; const prompt = ` Hey, can you write run some python code to turn on the lights, wait 10s and then turn off the lights? ` ; const tools = [ { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } ]; await run ( prompt , tools = tools , modality = "AUDIO" ) Function calling modes The Gemini API lets you control how the model uses the provided tools (function declarations). Specifically, you can set the mode within the. function_calling_config . AUTO (Default) : The model decides whether to generate a natural language response or suggest a function call based on the prompt and context. This is the most flexible mode and recommended for most scenarios. ANY : The model is constrained to always predict a function call and guarantees function schema adherence. If allowed_function_names is not specified, the model can choose from any of the provided function declarations. If allowed_function_names is provided as a list, the model can only choose from the functions in that list. Use this mode when you require a function call response to every prompt (if applicable). NONE : The model is prohibited from making function calls. This is equivalent to sending a request without any function declarations. Use this to temporarily disable function calling without removing your tool definitions. Python from google.genai import types # Configure function calling mode tool_config = types . ToolConfig ( function_calling_config = types . FunctionCallingConfig ( mode = "ANY" , allowed_function_names = [ "get_current_temperature" ] ) ) # Create the generation config config = types . GenerateContentConfig ( tools = [ tools ], # not defined here. tool_config = tool_config , ) JavaScript import { FunctionCallingConfigMode } from '@google/genai' ; // Configure function calling mode const toolConfig = { functionCallingConfig : { mode : FunctionCallingConfigMode . ANY , allowedFunctionNames : [ 'get_current_temperature' ] } }; // Create the generation config const config = { tools : tools , // not defined here. toolConfig : toolConfig , }; Automatic function calling (Python only) When using the Python SDK, you can provide Python functions directly as tools. The SDK converts these functions into declarations, manages the function call execution, and handles the response cycle for you. Define your function with type hints and a docstring. For optimal results, it is recommended to use Google-style docstrings. The SDK will then automatically: Detect function call responses from the model. Call the corresponding Python function in your code. Send the function's response back to the model. Return the model's final text response. The SDK currently does not parse argument descriptions into the property description slots of the generated function declaration. Instead, it sends the entire docstring as the top-level function description. Python from google import genai from google.genai import types # Define the function with type hints and docstring def get_current_temperature ( location : str ) - > dict : """Gets the current temperature for a given location. Args: location: The city and state, e.g. San Francisco, CA Returns: A dictionary containing the temperature and unit. """ # ... (implementation) ... return { "temperature" : 25 , "unit" : "Celsius" } # Configure the client client = genai . Client () config = types . GenerateContentConfig ( tools = [ get_current_temperature ] ) # Pass the function itself # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "What's the temperature in Boston?" , config = config , ) print ( response . text ) # The SDK handles the function call and returns the final text You can disable automatic function calling with: Python config = types . GenerateContentConfig ( tools = [ get_current_temperature ], automatic_function_calling = types . AutomaticFunctionCallingConfig ( disable = True ) ) Automatic function schema declaration The API is able to describe any of the following types. Pydantic types are allowed, as long as the fields defined on them are also composed of allowed types. Dict types (like dict[str: int] ) are not well supported here, don't use them. Python AllowedType = ( int | float | bool | str | list [ 'AllowedType' ] | pydantic . BaseModel ) To see what the inferred schema looks like, you can convert it using from_callable : Python def multiply ( a : float , b : float ): """Returns a * b.""" return a * b fn_decl = types . FunctionDeclaration . from_callable ( callable = multiply , client = client ) # to_json_dict() provides a clean JSON representation. print ( fn_decl . to_json_dict ()) Multi-tool use: Combine native tools with function calling You can enable multiple tools combining native tools with function calling at the same time. Here's an example that enables two tools, Grounding with Google Search and code execution , in a request using the Live API . Note: Multi-tool use is a- Live API only feature at the moment. The run() function declaration, which handles the asynchronous websocket setup, is omitted for brevity. Python # Multiple tasks example - combining lights, code execution, and search prompt = """ Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! """ tools = [ { 'google_search' : {}}, { 'code_execution' : {}}, { 'function_declarations' : [ turn_on_the_lights_schema , turn_off_the_lights_schema ]} # not defined here. ] # Execute the prompt with specified tools in audio modality await run ( prompt , tools = tools , modality = "AUDIO" ) JavaScript // Multiple tasks example - combining lights, code execution, and search const prompt = ` Hey, I need you to do three things for me. 1. Turn on the lights. 2. Then compute the largest prime palindrome under 100000. 3. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024. Thanks! ` ; const tools = [ { googleSearch : {} }, { codeExecution : {} }, { functionDeclarations : [ turnOnTheLightsSchema , turnOffTheLightsSchema ] } // not defined here. ]; // Execute the prompt with specified tools in audio modality await run ( prompt , { tools : tools , modality : "AUDIO" }); Python developers can try this out in the Live API Tool Use notebook . Model context protocol (MCP) Model Context Protocol (MCP) is an open standard for connecting AI applications with external tools and data. MCP provides a common protocol for models to access context, such as functions (tools), data sources (resources), or predefined prompts. The Gemini SDKs have built-in support for the MCP, reducing boilerplate code and offering automatic tool calling for MCP tools. When the model generates an MCP tool call, the Python and JavaScript client SDK can automatically execute the MCP tool and send the response back to the model in a subsequent request, continuing this loop until no more tool calls are made by the model. Here, you can find an example of how to use a local MCP server with Gemini and mcp SDK. Python Make sure the latest version of the mcp SDK is installed on your platform of choice. pip install mcp Note: Python supports automatic tool calling by passing in the ClientSession into the tools parameters. If you want to disable it, you can provide automatic_function_calling with disabled True . import os import asyncio from datetime import datetime from mcp import ClientSession , StdioServerParameters from mcp.client.stdio import stdio_client from google import genai client = genai . Client () # Create server parameters for stdio connection server_params = StdioServerParameters ( command = "npx" , # Executable args = [ "-y" , "@philschmid/weather-mcp" ], # MCP Server env = None , # Optional environment variables ) async def run (): async with stdio_client ( server_params ) as ( read , write ): async with ClientSession ( read , write ) as session : # Prompt to get the weather for the current day in London. prompt = f "What is the weather in London in { datetime . now () . strftime ( '%Y-%m- %d ' ) } ?" # Initialize the connection between client and server await session . initialize () # Send request to the model with MCP function declarations response = await client . aio . models . generate_content ( model = "gemini-2.5-flash" , contents = prompt , config = genai . types . GenerateContentConfig ( temperature = 0 , tools = [ session ], # uses the session, will automatically call the tool # Uncomment if you **don't** want the SDK to automatically call the tool # automatic_function_calling=genai.types.AutomaticFunctionCallingConfig( # disable=True # ), ), ) print ( response . text ) # Start the asyncio event loop and run the main function asyncio . run ( run ()) JavaScript Make sure the latest version of the mcp SDK is installed on your platform of choice. npm install @modelcontextprotocol/sdk Note: JavaScript supports automatic tool calling by wrapping the client with mcpToTool . If you want to disable it, you can provide automaticFunctionCalling with disabled true . import { GoogleGenAI , FunctionCallingConfigMode , mcpToTool } from '@google/genai' ; import { Client } from "@modelcontextprotocol/sdk/client/index.js" ; import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js" ; // Create server parameters for stdio connection const serverParams = new StdioClientTransport ({ command : "npx" , // Executable args : [ "-y" , "@philschmid/weather-mcp" ] // MCP Server }); const client = new Client ( { name : "example-client" , version : "1.0.0" } ); // Configure the client const ai = new GoogleGenAI ({}); // Initialize the connection between client and server await client . connect ( serverParams ); // Send request to the model with MCP tools const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : `What is the weather in London in ${ new Date (). toLocaleDateString () } ?` , config : { tools : [ mcpToTool ( client )], // uses the session, will automatically call the tool // Uncomment if you **don't** want the sdk to automatically call the tool // automaticFunctionCalling: { // disable: true, // }, }, }); console . log ( response . text ) // Close the connection await client . close (); Limitations with built-in MCP support Built-in MCP support is a experimental feature in our SDKs and has the following limitations: Only tools are supported, not resources nor prompts It is available for the Python and JavaScript/TypeScript SDK. Breaking changes might occur in future releases. Manual integration of MCP servers is always an option if these limit what you're building. Supported models This section lists models and their function calling capabilities. Experimental models are not included. You can find a comprehensive capabilities overview on the model overview page. Model Function Calling Parallel Function Calling Compositional Function Calling Gemini 2.5 Pro ✔️ ✔️ ✔️ Gemini 2.5 Flash ✔️ ✔️ ✔️ Gemini 2.5 Flash-Lite ✔️ ✔️ ✔️ Gemini 2.0 Flash ✔️ ✔️ ✔️ Gemini 2.0 Flash-Lite X X X Best practices Function and Parameter Descriptions: Be extremely clear and specific in your descriptions. The model relies on these to choose the correct function and provide appropriate arguments. Naming: Use descriptive function names (without spaces, periods, or dashes). Strong Typing: Use specific types (integer, string, enum) for parameters to reduce errors. If a parameter has a limited set of valid values, use an enum. Tool Selection: While the model can use an arbitrary number of tools, providing too many can increase the risk of selecting an incorrect or suboptimal tool. For best results, aim to provide only the relevant tools for the context or task, ideally keeping the active set to a maximum of 10-20. Consider dynamic tool selection based on conversation context if you have a large total number of tools. Prompt Engineering: Provide context: Tell the model its role (e.g., "You are a helpful weather assistant."). Give instructions: Specify how and when to use functions (e.g., "Don't guess dates; always use a future date for forecasts."). Encourage clarification: Instruct the model to ask clarifying questions if needed. Temperature: Use a low temperature (e.g., 0) for more deterministic and reliable function calls. Validation: If a function call has significant consequences (e.g., placing an order), validate the call with the user before executing it. Error Handling : Implement robust error handling in your functions to gracefully handle unexpected inputs or API failures. Return informative error messages that the model can use to generate helpful responses to the user. Security: Be mindful of security when calling external APIs. Use appropriate authentication and authorization mechanisms. Avoid exposing sensitive data in function calls. Token Limits: Function descriptions and parameters count towards your input token limit. If you're hitting token limits, consider limiting the number of functions or the length of the descriptions, break down complex tasks into smaller, more focused function sets. Notes and limitations Only a subset of the OpenAPI schema is supported. Supported parameter types in Python are limited. Automatic function calling is a Python SDK feature only. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_google-search_5a931e9e.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/google-search
2
  Title: Grounding with Google Search  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Grounding with Google Search | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Grounding with Google Search Grounding with Google Search connects the Gemini model to real-time web content and works with all available languages . This allows Gemini to provide more accurate answers and cite verifiable sources beyond its knowledge cutoff. Grounding helps you build applications that can: Increase factual accuracy: Reduce model hallucinations by basing responses on real-world information. Access real-time information: Answer questions about recent events and topics. Provide citations: Build user trust by showing the sources for the model's claims. Python from google import genai from google.genai import types # Configure the client client = genai . Client () # Define the grounding tool grounding_tool = types . Tool ( google_search = types . GoogleSearch () ) # Configure generation settings config = types . GenerateContentConfig ( tools = [ grounding_tool ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Who won the euro 2024?" , config = config , ) # Print the grounded response print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; // Configure the client const ai = new GoogleGenAI ({}); // Define the grounding tool const groundingTool = { googleSearch : {}, }; // Configure generation settings const config = { tools : [ groundingTool ], }; // Make the request const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : "Who won the euro 2024?" , config , }); // Print the grounded response console . log ( response . text ); REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -X POST \ -d '{ "contents": [ { "parts": [ {"text": "Who won the euro 2024?"} ] } ], "tools": [ { "google_search": {} } ] }' You can learn more by trying the Search tool notebook . How grounding with Google Search works When you enable the google_search tool, the model handles the entire workflow of searching, processing, and citing information automatically. User Prompt: Your application sends a user's prompt to the Gemini API with the google_search tool enabled. Prompt Analysis: The model analyzes the prompt and determines if a Google Search can improve the answer. Google Search: If needed, the model automatically generates one or multiple search queries and executes them. Search Results Processing: The model processes the search results, synthesizes the information, and formulates a response. Grounded Response: The API returns a final, user-friendly response that is grounded in the search results. This response includes the model's text answer and groundingMetadata with the search queries, web results, and citations. Understanding the Grounding Response When a response is successfully grounded, the response includes a groundingMetadata field. This structured data is essential for verifying claims and building a rich citation experience in your application. { "candidates" : [ { "content" : { "parts" : [ { "text" : "Spain won Euro 2024, defeating England 2-1 in the final. This victory marks Spain's record fourth European Championship title." } ], "role" : "model" }, "groundingMetadata" : { "webSearchQueries" : [ "UEFA Euro 2024 winner" , "who won euro 2024" ], "searchEntryPoint" : { "renderedContent" : "<!-- HTML and CSS for the search widget -->" }, "groundingChunks" : [ { "web" : { "uri" : "https://vertexaisearch.cloud.google.com....." , "title" : "aljazeera.com" }}, { "web" : { "uri" : "https://vertexaisearch.cloud.google.com....." , "title" : "uefa.com" }} ], "groundingSupports" : [ { "segment" : { "startIndex" : 0 , "endIndex" : 85 , "text" : "Spain won Euro 2024, defeatin..." }, "groundingChunkIndices" : [ 0 ] }, { "segment" : { "startIndex" : 86 , "endIndex" : 210 , "text" : "This victory marks Spain's..." }, "groundingChunkIndices" : [ 0 , 1 ] } ] } } ] } The Gemini API returns the following information with the groundingMetadata : webSearchQueries : Array of the search queries used. This is useful for debugging and understanding the model's reasoning process. searchEntryPoint : Contains the HTML and CSS to render the required Search Suggestions. Full usage requirements are detailed in the Terms of Service . groundingChunks : Array of objects containing the web sources ( uri and title ). groundingSupports : Array of chunks to connect model response text to the sources in groundingChunks . Each chunk links a text segment (defined by startIndex and endIndex ) to one or more groundingChunkIndices . This is the key to building inline citations. Grounding with Google Search can also be used in combination with the URL context tool to ground responses in both public web data and the specific URLs you provide. Attributing Sources with inline Citations The API returns structured citation data, giving you complete control over how you display sources in your user interface. You can use the groundingSupports and groundingChunks fields to link the model's statements directly to their sources. Here is a common pattern for processing the metadata to create a response with inline, clickable citations. Python def add_citations ( response ): text = response . text supports = response . candidates [ 0 ] . grounding_metadata . grounding_supports chunks = response . candidates [ 0 ] . grounding_metadata . grounding_chunks # Sort supports by end_index in descending order to avoid shifting issues when inserting. sorted_supports = sorted ( supports , key = lambda s : s . segment . end_index , reverse = True ) for support in sorted_supports : end_index = support . segment . end_index if support . grounding_chunk_indices : # Create citation string like [1](link1)[2](link2) citation_links = [] for i in support . grounding_chunk_indices : if i < len ( chunks ): uri = chunks [ i ] . web . uri citation_links . append ( f "[ { i + 1 } ]( { uri } )" ) citation_string = ", " . join ( citation_links ) text = text [: end_index ] + citation_string + text [ end_index :] return text # Assuming response with grounding metadata text_with_citations = add_citations ( response ) print ( text_with_citations ) JavaScript function addCitations ( response ) { let text = response . text ; const supports = response . candidates [ 0 ] ? . groundingMetadata ? . groundingSupports ; const chunks = response . candidates [ 0 ] ? . groundingMetadata ? . groundingChunks ; // Sort supports by end_index in descending order to avoid shifting issues when inserting. const sortedSupports = [... supports ]. sort ( ( a , b ) = > ( b . segment ? . endIndex ?? 0 ) - ( a . segment ? . endIndex ?? 0 ), ); for ( const support of sortedSupports ) { const endIndex = support . segment ? . endIndex ; if ( endIndex === undefined || ! support . groundingChunkIndices ? . length ) { continue ; } const citationLinks = support . groundingChunkIndices . map ( i = > { const uri = chunks [ i ] ? . web ? . uri ; if ( uri ) { return `[ ${ i + 1 } ]( ${ uri } )` ; } return null ; }) . filter ( Boolean ); if ( citationLinks . length > 0 ) { const citationString = citationLinks . join ( ", " ); text = text . slice ( 0 , endIndex ) + citationString + text . slice ( endIndex ); } } return text ; } const textWithCitations = addCitations ( response ); console . log ( textWithCitations ); The new response with inline citations will look like this: Spain won Euro 2024, defeating England 2-1 in the final.[1](https:/...), [2](https:/...), [4](https:/...), [5](https:/...) This victory marks Spain's record-breaking fourth European Championship title.[5]((https:/...), [2](https:/...), [3](https:/...), [4](https:/...) Pricing When you use Grounding with Google Search, your project is billed per API request that includes the google_search tool. If the model decides to execute multiple search queries to answer a single prompt (for example, searching for "UEFA Euro 2024 winner" and "Spain vs England Euro 2024 final score" within the same API call), this counts as a single billable use of the tool for that request. For detailed pricing information, see the Gemini API pricing page . Supported Models Experimental and Preview models are not included. You can find their capabilities on the model overview page. Model Grounding with Google Search Gemini 2.5 Pro ✔️ Gemini 2.5 Flash ✔️ Gemini 2.0 Flash ✔️ Gemini 1.5 Pro ✔️ Gemini 1.5 Flash ✔️ Note: Older models use a google_search_retrieval tool. For all current models, use the google_search tool as shown in the examples. Grounding with Gemini 1.5 Models (Legacy) While the google_search tool is recommended for Gemini 2.0 and later, Gemini 1.5 support a legacy tool named google_search_retrieval . This tool provides a dynamic mode that allows the model to decide whether to perform a search based on its confidence that the prompt requires fresh information. If the model's confidence is above a dynamic_threshold you set (a value between 0.0 and 1.0), it will perform a search. Python # Note: This is a legacy approach for Gemini 1.5 models. # The 'google_search' tool is recommended for all new development. import os from google import genai from google.genai import types client = genai . Client () retrieval_tool = types . Tool ( google_search_retrieval = types . GoogleSearchRetrieval ( dynamic_retrieval_config = types . DynamicRetrievalConfig ( mode = types . DynamicRetrievalConfigMode . MODE_DYNAMIC , dynamic_threshold = 0.7 # Only search if confidence > 70% ) ) ) config = types . GenerateContentConfig ( tools = [ retrieval_tool ] ) response = client . models . generate_content ( model = 'gemini-1.5-flash' , contents = "Who won the euro 2024?" , config = config , ) print ( response . text ) if not response . candidates [ 0 ] . grounding_metadata : print ( " \n Model answered from its own knowledge." ) JavaScript // Note: This is a legacy approach for Gemini 1.5 models. // The 'googleSearch' tool is recommended for all new development. import { GoogleGenAI , DynamicRetrievalConfigMode } from "@google/genai" ; const ai = new GoogleGenAI ({}); const retrievalTool = { googleSearchRetrieval : { dynamicRetrievalConfig : { mode : DynamicRetrievalConfigMode . MODE_DYNAMIC , dynamicThreshold : 0.7 , // Only search if confidence > 70% }, }, }; const config = { tools : [ retrievalTool ], }; const response = await ai . models . generateContent ({ model : "gemini-1.5-flash" , contents : "Who won the euro 2024?" , config , }); console . log ( response . text ); if ( ! response . candidates ? .[ 0 ] ? . groundingMetadata ) { console . log ( "\nModel answered from its own knowledge." ); } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -X POST \ -d '{ "contents": [ {"parts": [{"text": "Who won the euro 2024?"}]} ], "tools": [{ "google_search_retrieval": { "dynamic_retrieval_config": { "mode": "MODE_DYNAMIC", "dynamic_threshold": 0.7 } } }] }' What's next Try the Grounding with Google Search in the Gemini API Cookbook . Learn about other available tools, like Function Calling . Learn how to augment prompts with specific URLs using the URL context tool . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
 
2
  Title: Grounding with Google Search  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Grounding with Google Search | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Grounding with Google Search Grounding with Google Search connects the Gemini model to real-time web content and works with all available languages . This allows Gemini to provide more accurate answers and cite verifiable sources beyond its knowledge cutoff. Grounding helps you build applications that can: Increase factual accuracy: Reduce model hallucinations by basing responses on real-world information. Access real-time information: Answer questions about recent events and topics. Provide citations: Build user trust by showing the sources for the model's claims. Python from google import genai from google.genai import types # Configure the client client = genai . Client () # Define the grounding tool grounding_tool = types . Tool ( google_search = types . GoogleSearch () ) # Configure generation settings config = types . GenerateContentConfig ( tools = [ grounding_tool ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Who won the euro 2024?" , config = config , ) # Print the grounded response print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; // Configure the client const ai = new GoogleGenAI ({}); // Define the grounding tool const groundingTool = { googleSearch : {}, }; // Configure generation settings const config = { tools : [ groundingTool ], }; // Make the request const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : "Who won the euro 2024?" , config , }); // Print the grounded response console . log ( response . text ); REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -X POST \ -d '{ "contents": [ { "parts": [ {"text": "Who won the euro 2024?"} ] } ], "tools": [ { "google_search": {} } ] }' You can learn more by trying the Search tool notebook . How grounding with Google Search works When you enable the google_search tool, the model handles the entire workflow of searching, processing, and citing information automatically. User Prompt: Your application sends a user's prompt to the Gemini API with the google_search tool enabled. Prompt Analysis: The model analyzes the prompt and determines if a Google Search can improve the answer. Google Search: If needed, the model automatically generates one or multiple search queries and executes them. Search Results Processing: The model processes the search results, synthesizes the information, and formulates a response. Grounded Response: The API returns a final, user-friendly response that is grounded in the search results. This response includes the model's text answer and groundingMetadata with the search queries, web results, and citations. Understanding the Grounding Response When a response is successfully grounded, the response includes a groundingMetadata field. This structured data is essential for verifying claims and building a rich citation experience in your application. { "candidates" : [ { "content" : { "parts" : [ { "text" : "Spain won Euro 2024, defeating England 2-1 in the final. This victory marks Spain's record fourth European Championship title." } ], "role" : "model" }, "groundingMetadata" : { "webSearchQueries" : [ "UEFA Euro 2024 winner" , "who won euro 2024" ], "searchEntryPoint" : { "renderedContent" : "<!-- HTML and CSS for the search widget -->" }, "groundingChunks" : [ { "web" : { "uri" : "https://vertexaisearch.cloud.google.com....." , "title" : "aljazeera.com" }}, { "web" : { "uri" : "https://vertexaisearch.cloud.google.com....." , "title" : "uefa.com" }} ], "groundingSupports" : [ { "segment" : { "startIndex" : 0 , "endIndex" : 85 , "text" : "Spain won Euro 2024, defeatin..." }, "groundingChunkIndices" : [ 0 ] }, { "segment" : { "startIndex" : 86 , "endIndex" : 210 , "text" : "This victory marks Spain's..." }, "groundingChunkIndices" : [ 0 , 1 ] } ] } } ] } The Gemini API returns the following information with the groundingMetadata : webSearchQueries : Array of the search queries used. This is useful for debugging and understanding the model's reasoning process. searchEntryPoint : Contains the HTML and CSS to render the required Search Suggestions. Full usage requirements are detailed in the Terms of Service . groundingChunks : Array of objects containing the web sources ( uri and title ). groundingSupports : Array of chunks to connect model response text to the sources in groundingChunks . Each chunk links a text segment (defined by startIndex and endIndex ) to one or more groundingChunkIndices . This is the key to building inline citations. Grounding with Google Search can also be used in combination with the URL context tool to ground responses in both public web data and the specific URLs you provide. Attributing Sources with inline Citations The API returns structured citation data, giving you complete control over how you display sources in your user interface. You can use the groundingSupports and groundingChunks fields to link the model's statements directly to their sources. Here is a common pattern for processing the metadata to create a response with inline, clickable citations. Python def add_citations ( response ): text = response . text supports = response . candidates [ 0 ] . grounding_metadata . grounding_supports chunks = response . candidates [ 0 ] . grounding_metadata . grounding_chunks # Sort supports by end_index in descending order to avoid shifting issues when inserting. sorted_supports = sorted ( supports , key = lambda s : s . segment . end_index , reverse = True ) for support in sorted_supports : end_index = support . segment . end_index if support . grounding_chunk_indices : # Create citation string like [1](link1)[2](link2) citation_links = [] for i in support . grounding_chunk_indices : if i < len ( chunks ): uri = chunks [ i ] . web . uri citation_links . append ( f "[ { i + 1 } ]( { uri } )" ) citation_string = ", " . join ( citation_links ) text = text [: end_index ] + citation_string + text [ end_index :] return text # Assuming response with grounding metadata text_with_citations = add_citations ( response ) print ( text_with_citations ) JavaScript function addCitations ( response ) { let text = response . text ; const supports = response . candidates [ 0 ] ? . groundingMetadata ? . groundingSupports ; const chunks = response . candidates [ 0 ] ? . groundingMetadata ? . groundingChunks ; // Sort supports by end_index in descending order to avoid shifting issues when inserting. const sortedSupports = [... supports ]. sort ( ( a , b ) = > ( b . segment ? . endIndex ?? 0 ) - ( a . segment ? . endIndex ?? 0 ), ); for ( const support of sortedSupports ) { const endIndex = support . segment ? . endIndex ; if ( endIndex === undefined || ! support . groundingChunkIndices ? . length ) { continue ; } const citationLinks = support . groundingChunkIndices . map ( i = > { const uri = chunks [ i ] ? . web ? . uri ; if ( uri ) { return `[ ${ i + 1 } ]( ${ uri } )` ; } return null ; }) . filter ( Boolean ); if ( citationLinks . length > 0 ) { const citationString = citationLinks . join ( ", " ); text = text . slice ( 0 , endIndex ) + citationString + text . slice ( endIndex ); } } return text ; } const textWithCitations = addCitations ( response ); console . log ( textWithCitations ); The new response with inline citations will look like this: Spain won Euro 2024, defeating England 2-1 in the final.[1](https:/...), [2](https:/...), [4](https:/...), [5](https:/...) This victory marks Spain's record-breaking fourth European Championship title.[5]((https:/...), [2](https:/...), [3](https:/...), [4](https:/...) Pricing When you use Grounding with Google Search, your project is billed per API request that includes the google_search tool. If the model decides to execute multiple search queries to answer a single prompt (for example, searching for "UEFA Euro 2024 winner" and "Spain vs England Euro 2024 final score" within the same API call), this counts as a single billable use of the tool for that request. For detailed pricing information, see the Gemini API pricing page . Supported Models Experimental and Preview models are not included. You can find their capabilities on the model overview page. Model Grounding with Google Search Gemini 2.5 Pro ✔️ Gemini 2.5 Flash ✔️ Gemini 2.5 Flash-Lite ✔️ Gemini 2.0 Flash ✔️ Gemini 1.5 Pro ✔️ Gemini 1.5 Flash ✔️ Note: Older models use a google_search_retrieval tool. For all current models, use the google_search tool as shown in the examples. Grounding with Gemini 1.5 Models (Legacy) While the google_search tool is recommended for Gemini 2.0 and later, Gemini 1.5 support a legacy tool named google_search_retrieval . This tool provides a dynamic mode that allows the model to decide whether to perform a search based on its confidence that the prompt requires fresh information. If the model's confidence is above a dynamic_threshold you set (a value between 0.0 and 1.0), it will perform a search. Python # Note: This is a legacy approach for Gemini 1.5 models. # The 'google_search' tool is recommended for all new development. import os from google import genai from google.genai import types client = genai . Client () retrieval_tool = types . Tool ( google_search_retrieval = types . GoogleSearchRetrieval ( dynamic_retrieval_config = types . DynamicRetrievalConfig ( mode = types . DynamicRetrievalConfigMode . MODE_DYNAMIC , dynamic_threshold = 0.7 # Only search if confidence > 70% ) ) ) config = types . GenerateContentConfig ( tools = [ retrieval_tool ] ) response = client . models . generate_content ( model = 'gemini-1.5-flash' , contents = "Who won the euro 2024?" , config = config , ) print ( response . text ) if not response . candidates [ 0 ] . grounding_metadata : print ( " \n Model answered from its own knowledge." ) JavaScript // Note: This is a legacy approach for Gemini 1.5 models. // The 'googleSearch' tool is recommended for all new development. import { GoogleGenAI , DynamicRetrievalConfigMode } from "@google/genai" ; const ai = new GoogleGenAI ({}); const retrievalTool = { googleSearchRetrieval : { dynamicRetrievalConfig : { mode : DynamicRetrievalConfigMode . MODE_DYNAMIC , dynamicThreshold : 0.7 , // Only search if confidence > 70% }, }, }; const config = { tools : [ retrievalTool ], }; const response = await ai . models . generateContent ({ model : "gemini-1.5-flash" , contents : "Who won the euro 2024?" , config , }); console . log ( response . text ); if ( ! response . candidates ? .[ 0 ] ? . groundingMetadata ) { console . log ( "\nModel answered from its own knowledge." ); } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -X POST \ -d '{ "contents": [ {"parts": [{"text": "Who won the euro 2024?"}]} ], "tools": [{ "google_search_retrieval": { "dynamic_retrieval_config": { "mode": "MODE_DYNAMIC", "dynamic_threshold": 0.7 } } }] }' What's next Try the Grounding with Google Search in the Gemini API Cookbook . Learn about other available tools, like Function Calling . Learn how to augment prompts with specific URLs using the URL context tool . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_grounding_25ac83ea.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/grounding
2
  Title: Grounding with Google Search  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Grounding with Google Search | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Grounding with Google Search Grounding with Google Search connects the Gemini model to real-time web content and works with all available languages . This allows Gemini to provide more accurate answers and cite verifiable sources beyond its knowledge cutoff. Grounding helps you build applications that can: Increase factual accuracy: Reduce model hallucinations by basing responses on real-world information. Access real-time information: Answer questions about recent events and topics. Provide citations: Build user trust by showing the sources for the model's claims. Python from google import genai from google.genai import types # Configure the client client = genai . Client () # Define the grounding tool grounding_tool = types . Tool ( google_search = types . GoogleSearch () ) # Configure generation settings config = types . GenerateContentConfig ( tools = [ grounding_tool ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Who won the euro 2024?" , config = config , ) # Print the grounded response print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; // Configure the client const ai = new GoogleGenAI ({}); // Define the grounding tool const groundingTool = { googleSearch : {}, }; // Configure generation settings const config = { tools : [ groundingTool ], }; // Make the request const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : "Who won the euro 2024?" , config , }); // Print the grounded response console . log ( response . text ); REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -X POST \ -d '{ "contents": [ { "parts": [ {"text": "Who won the euro 2024?"} ] } ], "tools": [ { "google_search": {} } ] }' You can learn more by trying the Search tool notebook . How grounding with Google Search works When you enable the google_search tool, the model handles the entire workflow of searching, processing, and citing information automatically. User Prompt: Your application sends a user's prompt to the Gemini API with the google_search tool enabled. Prompt Analysis: The model analyzes the prompt and determines if a Google Search can improve the answer. Google Search: If needed, the model automatically generates one or multiple search queries and executes them. Search Results Processing: The model processes the search results, synthesizes the information, and formulates a response. Grounded Response: The API returns a final, user-friendly response that is grounded in the search results. This response includes the model's text answer and groundingMetadata with the search queries, web results, and citations. Understanding the Grounding Response When a response is successfully grounded, the response includes a groundingMetadata field. This structured data is essential for verifying claims and building a rich citation experience in your application. { "candidates" : [ { "content" : { "parts" : [ { "text" : "Spain won Euro 2024, defeating England 2-1 in the final. This victory marks Spain's record fourth European Championship title." } ], "role" : "model" }, "groundingMetadata" : { "webSearchQueries" : [ "UEFA Euro 2024 winner" , "who won euro 2024" ], "searchEntryPoint" : { "renderedContent" : "<!-- HTML and CSS for the search widget -->" }, "groundingChunks" : [ { "web" : { "uri" : "https://vertexaisearch.cloud.google.com....." , "title" : "aljazeera.com" }}, { "web" : { "uri" : "https://vertexaisearch.cloud.google.com....." , "title" : "uefa.com" }} ], "groundingSupports" : [ { "segment" : { "startIndex" : 0 , "endIndex" : 85 , "text" : "Spain won Euro 2024, defeatin..." }, "groundingChunkIndices" : [ 0 ] }, { "segment" : { "startIndex" : 86 , "endIndex" : 210 , "text" : "This victory marks Spain's..." }, "groundingChunkIndices" : [ 0 , 1 ] } ] } } ] } The Gemini API returns the following information with the groundingMetadata : webSearchQueries : Array of the search queries used. This is useful for debugging and understanding the model's reasoning process. searchEntryPoint : Contains the HTML and CSS to render the required Search Suggestions. Full usage requirements are detailed in the Terms of Service . groundingChunks : Array of objects containing the web sources ( uri and title ). groundingSupports : Array of chunks to connect model response text to the sources in groundingChunks . Each chunk links a text segment (defined by startIndex and endIndex ) to one or more groundingChunkIndices . This is the key to building inline citations. Grounding with Google Search can also be used in combination with the URL context tool to ground responses in both public web data and the specific URLs you provide. Attributing Sources with inline Citations The API returns structured citation data, giving you complete control over how you display sources in your user interface. You can use the groundingSupports and groundingChunks fields to link the model's statements directly to their sources. Here is a common pattern for processing the metadata to create a response with inline, clickable citations. Python def add_citations ( response ): text = response . text supports = response . candidates [ 0 ] . grounding_metadata . grounding_supports chunks = response . candidates [ 0 ] . grounding_metadata . grounding_chunks # Sort supports by end_index in descending order to avoid shifting issues when inserting. sorted_supports = sorted ( supports , key = lambda s : s . segment . end_index , reverse = True ) for support in sorted_supports : end_index = support . segment . end_index if support . grounding_chunk_indices : # Create citation string like [1](link1)[2](link2) citation_links = [] for i in support . grounding_chunk_indices : if i < len ( chunks ): uri = chunks [ i ] . web . uri citation_links . append ( f "[ { i + 1 } ]( { uri } )" ) citation_string = ", " . join ( citation_links ) text = text [: end_index ] + citation_string + text [ end_index :] return text # Assuming response with grounding metadata text_with_citations = add_citations ( response ) print ( text_with_citations ) JavaScript function addCitations ( response ) { let text = response . text ; const supports = response . candidates [ 0 ] ? . groundingMetadata ? . groundingSupports ; const chunks = response . candidates [ 0 ] ? . groundingMetadata ? . groundingChunks ; // Sort supports by end_index in descending order to avoid shifting issues when inserting. const sortedSupports = [... supports ]. sort ( ( a , b ) = > ( b . segment ? . endIndex ?? 0 ) - ( a . segment ? . endIndex ?? 0 ), ); for ( const support of sortedSupports ) { const endIndex = support . segment ? . endIndex ; if ( endIndex === undefined || ! support . groundingChunkIndices ? . length ) { continue ; } const citationLinks = support . groundingChunkIndices . map ( i = > { const uri = chunks [ i ] ? . web ? . uri ; if ( uri ) { return `[ ${ i + 1 } ]( ${ uri } )` ; } return null ; }) . filter ( Boolean ); if ( citationLinks . length > 0 ) { const citationString = citationLinks . join ( ", " ); text = text . slice ( 0 , endIndex ) + citationString + text . slice ( endIndex ); } } return text ; } const textWithCitations = addCitations ( response ); console . log ( textWithCitations ); The new response with inline citations will look like this: Spain won Euro 2024, defeating England 2-1 in the final.[1](https:/...), [2](https:/...), [4](https:/...), [5](https:/...) This victory marks Spain's record-breaking fourth European Championship title.[5]((https:/...), [2](https:/...), [3](https:/...), [4](https:/...) Pricing When you use Grounding with Google Search, your project is billed per API request that includes the google_search tool. If the model decides to execute multiple search queries to answer a single prompt (for example, searching for "UEFA Euro 2024 winner" and "Spain vs England Euro 2024 final score" within the same API call), this counts as a single billable use of the tool for that request. For detailed pricing information, see the Gemini API pricing page . Supported Models Experimental and Preview models are not included. You can find their capabilities on the model overview page. Model Grounding with Google Search Gemini 2.5 Pro ✔️ Gemini 2.5 Flash ✔️ Gemini 2.0 Flash ✔️ Gemini 1.5 Pro ✔️ Gemini 1.5 Flash ✔️ Note: Older models use a google_search_retrieval tool. For all current models, use the google_search tool as shown in the examples. Grounding with Gemini 1.5 Models (Legacy) While the google_search tool is recommended for Gemini 2.0 and later, Gemini 1.5 support a legacy tool named google_search_retrieval . This tool provides a dynamic mode that allows the model to decide whether to perform a search based on its confidence that the prompt requires fresh information. If the model's confidence is above a dynamic_threshold you set (a value between 0.0 and 1.0), it will perform a search. Python # Note: This is a legacy approach for Gemini 1.5 models. # The 'google_search' tool is recommended for all new development. import os from google import genai from google.genai import types client = genai . Client () retrieval_tool = types . Tool ( google_search_retrieval = types . GoogleSearchRetrieval ( dynamic_retrieval_config = types . DynamicRetrievalConfig ( mode = types . DynamicRetrievalConfigMode . MODE_DYNAMIC , dynamic_threshold = 0.7 # Only search if confidence > 70% ) ) ) config = types . GenerateContentConfig ( tools = [ retrieval_tool ] ) response = client . models . generate_content ( model = 'gemini-1.5-flash' , contents = "Who won the euro 2024?" , config = config , ) print ( response . text ) if not response . candidates [ 0 ] . grounding_metadata : print ( " \n Model answered from its own knowledge." ) JavaScript // Note: This is a legacy approach for Gemini 1.5 models. // The 'googleSearch' tool is recommended for all new development. import { GoogleGenAI , DynamicRetrievalConfigMode } from "@google/genai" ; const ai = new GoogleGenAI ({}); const retrievalTool = { googleSearchRetrieval : { dynamicRetrievalConfig : { mode : DynamicRetrievalConfigMode . MODE_DYNAMIC , dynamicThreshold : 0.7 , // Only search if confidence > 70% }, }, }; const config = { tools : [ retrievalTool ], }; const response = await ai . models . generateContent ({ model : "gemini-1.5-flash" , contents : "Who won the euro 2024?" , config , }); console . log ( response . text ); if ( ! response . candidates ? .[ 0 ] ? . groundingMetadata ) { console . log ( "\nModel answered from its own knowledge." ); } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -X POST \ -d '{ "contents": [ {"parts": [{"text": "Who won the euro 2024?"}]} ], "tools": [{ "google_search_retrieval": { "dynamic_retrieval_config": { "mode": "MODE_DYNAMIC", "dynamic_threshold": 0.7 } } }] }' What's next Try the Grounding with Google Search in the Gemini API Cookbook . Learn about other available tools, like Function Calling . Learn how to augment prompts with specific URLs using the URL context tool . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
 
2
  Title: Grounding with Google Search  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Grounding with Google Search | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Grounding with Google Search Grounding with Google Search connects the Gemini model to real-time web content and works with all available languages . This allows Gemini to provide more accurate answers and cite verifiable sources beyond its knowledge cutoff. Grounding helps you build applications that can: Increase factual accuracy: Reduce model hallucinations by basing responses on real-world information. Access real-time information: Answer questions about recent events and topics. Provide citations: Build user trust by showing the sources for the model's claims. Python from google import genai from google.genai import types # Configure the client client = genai . Client () # Define the grounding tool grounding_tool = types . Tool ( google_search = types . GoogleSearch () ) # Configure generation settings config = types . GenerateContentConfig ( tools = [ grounding_tool ] ) # Make the request response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = "Who won the euro 2024?" , config = config , ) # Print the grounded response print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; // Configure the client const ai = new GoogleGenAI ({}); // Define the grounding tool const groundingTool = { googleSearch : {}, }; // Configure generation settings const config = { tools : [ groundingTool ], }; // Make the request const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : "Who won the euro 2024?" , config , }); // Print the grounded response console . log ( response . text ); REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -X POST \ -d '{ "contents": [ { "parts": [ {"text": "Who won the euro 2024?"} ] } ], "tools": [ { "google_search": {} } ] }' You can learn more by trying the Search tool notebook . How grounding with Google Search works When you enable the google_search tool, the model handles the entire workflow of searching, processing, and citing information automatically. User Prompt: Your application sends a user's prompt to the Gemini API with the google_search tool enabled. Prompt Analysis: The model analyzes the prompt and determines if a Google Search can improve the answer. Google Search: If needed, the model automatically generates one or multiple search queries and executes them. Search Results Processing: The model processes the search results, synthesizes the information, and formulates a response. Grounded Response: The API returns a final, user-friendly response that is grounded in the search results. This response includes the model's text answer and groundingMetadata with the search queries, web results, and citations. Understanding the Grounding Response When a response is successfully grounded, the response includes a groundingMetadata field. This structured data is essential for verifying claims and building a rich citation experience in your application. { "candidates" : [ { "content" : { "parts" : [ { "text" : "Spain won Euro 2024, defeating England 2-1 in the final. This victory marks Spain's record fourth European Championship title." } ], "role" : "model" }, "groundingMetadata" : { "webSearchQueries" : [ "UEFA Euro 2024 winner" , "who won euro 2024" ], "searchEntryPoint" : { "renderedContent" : "<!-- HTML and CSS for the search widget -->" }, "groundingChunks" : [ { "web" : { "uri" : "https://vertexaisearch.cloud.google.com....." , "title" : "aljazeera.com" }}, { "web" : { "uri" : "https://vertexaisearch.cloud.google.com....." , "title" : "uefa.com" }} ], "groundingSupports" : [ { "segment" : { "startIndex" : 0 , "endIndex" : 85 , "text" : "Spain won Euro 2024, defeatin..." }, "groundingChunkIndices" : [ 0 ] }, { "segment" : { "startIndex" : 86 , "endIndex" : 210 , "text" : "This victory marks Spain's..." }, "groundingChunkIndices" : [ 0 , 1 ] } ] } } ] } The Gemini API returns the following information with the groundingMetadata : webSearchQueries : Array of the search queries used. This is useful for debugging and understanding the model's reasoning process. searchEntryPoint : Contains the HTML and CSS to render the required Search Suggestions. Full usage requirements are detailed in the Terms of Service . groundingChunks : Array of objects containing the web sources ( uri and title ). groundingSupports : Array of chunks to connect model response text to the sources in groundingChunks . Each chunk links a text segment (defined by startIndex and endIndex ) to one or more groundingChunkIndices . This is the key to building inline citations. Grounding with Google Search can also be used in combination with the URL context tool to ground responses in both public web data and the specific URLs you provide. Attributing Sources with inline Citations The API returns structured citation data, giving you complete control over how you display sources in your user interface. You can use the groundingSupports and groundingChunks fields to link the model's statements directly to their sources. Here is a common pattern for processing the metadata to create a response with inline, clickable citations. Python def add_citations ( response ): text = response . text supports = response . candidates [ 0 ] . grounding_metadata . grounding_supports chunks = response . candidates [ 0 ] . grounding_metadata . grounding_chunks # Sort supports by end_index in descending order to avoid shifting issues when inserting. sorted_supports = sorted ( supports , key = lambda s : s . segment . end_index , reverse = True ) for support in sorted_supports : end_index = support . segment . end_index if support . grounding_chunk_indices : # Create citation string like [1](link1)[2](link2) citation_links = [] for i in support . grounding_chunk_indices : if i < len ( chunks ): uri = chunks [ i ] . web . uri citation_links . append ( f "[ { i + 1 } ]( { uri } )" ) citation_string = ", " . join ( citation_links ) text = text [: end_index ] + citation_string + text [ end_index :] return text # Assuming response with grounding metadata text_with_citations = add_citations ( response ) print ( text_with_citations ) JavaScript function addCitations ( response ) { let text = response . text ; const supports = response . candidates [ 0 ] ? . groundingMetadata ? . groundingSupports ; const chunks = response . candidates [ 0 ] ? . groundingMetadata ? . groundingChunks ; // Sort supports by end_index in descending order to avoid shifting issues when inserting. const sortedSupports = [... supports ]. sort ( ( a , b ) = > ( b . segment ? . endIndex ?? 0 ) - ( a . segment ? . endIndex ?? 0 ), ); for ( const support of sortedSupports ) { const endIndex = support . segment ? . endIndex ; if ( endIndex === undefined || ! support . groundingChunkIndices ? . length ) { continue ; } const citationLinks = support . groundingChunkIndices . map ( i = > { const uri = chunks [ i ] ? . web ? . uri ; if ( uri ) { return `[ ${ i + 1 } ]( ${ uri } )` ; } return null ; }) . filter ( Boolean ); if ( citationLinks . length > 0 ) { const citationString = citationLinks . join ( ", " ); text = text . slice ( 0 , endIndex ) + citationString + text . slice ( endIndex ); } } return text ; } const textWithCitations = addCitations ( response ); console . log ( textWithCitations ); The new response with inline citations will look like this: Spain won Euro 2024, defeating England 2-1 in the final.[1](https:/...), [2](https:/...), [4](https:/...), [5](https:/...) This victory marks Spain's record-breaking fourth European Championship title.[5]((https:/...), [2](https:/...), [3](https:/...), [4](https:/...) Pricing When you use Grounding with Google Search, your project is billed per API request that includes the google_search tool. If the model decides to execute multiple search queries to answer a single prompt (for example, searching for "UEFA Euro 2024 winner" and "Spain vs England Euro 2024 final score" within the same API call), this counts as a single billable use of the tool for that request. For detailed pricing information, see the Gemini API pricing page . Supported Models Experimental and Preview models are not included. You can find their capabilities on the model overview page. Model Grounding with Google Search Gemini 2.5 Pro ✔️ Gemini 2.5 Flash ✔️ Gemini 2.5 Flash-Lite ✔️ Gemini 2.0 Flash ✔️ Gemini 1.5 Pro ✔️ Gemini 1.5 Flash ✔️ Note: Older models use a google_search_retrieval tool. For all current models, use the google_search tool as shown in the examples. Grounding with Gemini 1.5 Models (Legacy) While the google_search tool is recommended for Gemini 2.0 and later, Gemini 1.5 support a legacy tool named google_search_retrieval . This tool provides a dynamic mode that allows the model to decide whether to perform a search based on its confidence that the prompt requires fresh information. If the model's confidence is above a dynamic_threshold you set (a value between 0.0 and 1.0), it will perform a search. Python # Note: This is a legacy approach for Gemini 1.5 models. # The 'google_search' tool is recommended for all new development. import os from google import genai from google.genai import types client = genai . Client () retrieval_tool = types . Tool ( google_search_retrieval = types . GoogleSearchRetrieval ( dynamic_retrieval_config = types . DynamicRetrievalConfig ( mode = types . DynamicRetrievalConfigMode . MODE_DYNAMIC , dynamic_threshold = 0.7 # Only search if confidence > 70% ) ) ) config = types . GenerateContentConfig ( tools = [ retrieval_tool ] ) response = client . models . generate_content ( model = 'gemini-1.5-flash' , contents = "Who won the euro 2024?" , config = config , ) print ( response . text ) if not response . candidates [ 0 ] . grounding_metadata : print ( " \n Model answered from its own knowledge." ) JavaScript // Note: This is a legacy approach for Gemini 1.5 models. // The 'googleSearch' tool is recommended for all new development. import { GoogleGenAI , DynamicRetrievalConfigMode } from "@google/genai" ; const ai = new GoogleGenAI ({}); const retrievalTool = { googleSearchRetrieval : { dynamicRetrievalConfig : { mode : DynamicRetrievalConfigMode . MODE_DYNAMIC , dynamicThreshold : 0.7 , // Only search if confidence > 70% }, }, }; const config = { tools : [ retrievalTool ], }; const response = await ai . models . generateContent ({ model : "gemini-1.5-flash" , contents : "Who won the euro 2024?" , config , }); console . log ( response . text ); if ( ! response . candidates ? .[ 0 ] ? . groundingMetadata ) { console . log ( "\nModel answered from its own knowledge." ); } REST curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -X POST \ -d '{ "contents": [ {"parts": [{"text": "Who won the euro 2024?"}]} ], "tools": [{ "google_search_retrieval": { "dynamic_retrieval_config": { "mode": "MODE_DYNAMIC", "dynamic_threshold": 0.7 } } }] }' What's next Try the Grounding with Google Search in the Gemini API Cookbook . Learn about other available tools, like Function Calling . Learn how to augment prompts with specific URLs using the URL context tool . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_image-generation_45e977a9.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ URL: https://ai.google.dev/gemini-api/docs/image-generation#prompt-guide
2
+ Title: Image generation with Gemini (aka Nano Banana)  |  Gemini API  |  Google AI for Developers
3
+ ==================================================
4
+
5
+ Image generation with Gemini (aka Nano Banana) | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Image generation with Gemini (aka Nano Banana) Gemini can generate and process images conversationally. You can prompt Gemini with text, images, or a combination of both allowing you to create, edit, and iterate on visuals with unprecedented control: Text-to-Image: Generate high-quality images from simple or complex text descriptions. Image + Text-to-Image (Editing): Provide an image and use text prompts to add, remove, or modify elements, change the style, or adjust the color grading. Multi-Image to Image (Composition & Style Transfer): Use multiple input images to compose a new scene or transfer the style from one image to another. Iterative Refinement: Engage in a conversation to progressively refine your image over multiple turns, making small adjustments until it's perfect. High-Fidelity Text Rendering: Accurately generate images that contain legible and well-placed text, ideal for logos, diagrams, and posters. All generated images include a SynthID watermark . Image generation (text-to-image) The following code demonstrates how to generate an image based on a descriptive prompt. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () prompt = ( "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme" ) response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ prompt ], ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) elif part . inline_data is not None : image = Image . open ( BytesIO ( part . inline_data . data )) image . save ( "generated_image.png" ) JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme" ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "gemini-native-image.png" , buffer ); console . log ( "Image saved as gemini-native-image.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "Create a picture of a nano banana dish in a " + " fancy restaurant with a Gemini theme" ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "gemini_generated_image.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme"} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > gemini-native-image.png AI-generated image of a nano banana dish in a Gemini-themed restaurant Image editing (text-and-image-to-image) Reminder : Make sure you have the necessary rights to any images you upload. Don't generate content that infringe on others' rights, including videos or images that deceive, harass, or harm. Your use of this generative AI service is subject to our Prohibited Use Policy . The following example demonstrates uploading base64 encoded images. For multiple images, larger payloads, and supported MIME types, check the Image understanding page. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () prompt = ( "Create a picture of my cat eating a nano-banana in a " "fancy restaurant under the Gemini constellation" , ) image = Image . open ( "/path/to/cat_image.png" ) response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ prompt , image ], ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) elif part . inline_data is not None : image = Image . open ( BytesIO ( part . inline_data . data )) image . save ( "generated_image.png" ) JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath = "path/to/cat_image.png" ; const imageData = fs . readFileSync ( imagePath ); const base64Image = imageData . toString ( "base64" ); const prompt = [ { text : "Create a picture of my cat eating a nano-banana in a" + "fancy restaurant under the Gemini constellation" }, { inlineData : { mimeType : "image/png" , data : base64Image , }, }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "gemini-native-image.png" , buffer ); console . log ( "Image saved as gemini-native-image.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imagePath := "/path/to/cat_image.png" imgData , _ := os . ReadFile ( imagePath ) parts := [] * genai . Part { genai . NewPartFromText ( "Create a picture of my cat eating a nano-banana in a fancy restaurant under the Gemini constellation" ), & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData , }, }, } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "gemini_generated_image.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH = /path/to/cat_image.jpeg if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ {\"text\": \"'Create a picture of my cat eating a nano-banana in a fancy restaurant under the Gemini constellation\"}, { \"inline_data\": { \"mime_type\":\"image/jpeg\", \"data\": \" $IMG_BASE64 \" } } ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > gemini-edited-image.png AI-generated image of a cat eating a nano banana Other image generation modes Gemini supports other image interaction modes based on prompt structure and context, including: Text to image(s) and text (interleaved): Outputs images with related text. Example prompt: "Generate an illustrated recipe for a paella." Image(s) and text to image(s) and text (interleaved) : Uses input images and text to create new related images and text. Example prompt: (With an image of a furnished room) "What other color sofas would work in my space? can you update the image?" Multi-turn image editing (chat): Keep generating and editing images conversationally. Example prompts: [upload an image of a blue car.] , "Turn this car into a convertible.", "Now change the color to yellow." Prompting guide and strategies Mastering Gemini 2.5 Flash Image Generation starts with one fundamental principle: Describe the scene, don't just list keywords. The model's core strength is its deep language understanding. A narrative, descriptive paragraph will almost always produce a better, more coherent image than a list of disconnected words. Prompts for generating images The following strategies will help you create effective prompts to generate exactly the images you're looking for. 1. Photorealistic scenes For realistic images, use photography terms. Mention camera angles, lens types, lighting, and fine details to guide the model toward a photorealistic result. Template A photorealistic [shot type] of [subject], [action or expression], set in [environment]. The scene is illuminated by [lighting description], creating a [mood] atmosphere. Captured with a [camera/lens details], emphasizing [key textures and details]. The image should be in a [aspect ratio] format. Prompt A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful. Vertical portrait orientation. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop with pottery wheels and shelves of clay pots in the background. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay and the fabric of his apron. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'photorealistic_example.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop with pottery wheels and shelves of clay pots in the background. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay and the fabric of his apron. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "photorealistic_example.png" , buffer ); console . log ( "Image saved as photorealistic_example.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop with pottery wheels and shelves of clay pots in the background. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay and the fabric of his apron. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "photorealistic_example.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop with pottery wheels and shelves of clay pots in the background. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay and the fabric of his apron. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > photorealistic_example.png A photorealistic close-up portrait of an elderly Japanese ceramicist... 2. Stylized illustrations & stickers To create stickers, icons, or assets, be explicit about the style and request a transparent background. Template A [style] sticker of a [subject], featuring [key characteristics] and a [color palette]. The design should have [line style] and [shading style]. The background must be transparent. Prompt A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It's munching on a green bamboo leaf. The design features bold, clean outlines, simple cel-shading, and a vibrant color palette. The background must be white. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It's munching on a green bamboo leaf. The design features bold, clean outlines, simple cel-shading, and a vibrant color palette. The background must be white." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'red_panda_sticker.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It's munching on a green bamboo leaf. The design features bold, clean outlines, simple cel-shading, and a vibrant color palette. The background must be white." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "red_panda_sticker.png" , buffer ); console . log ( "Image saved as red_panda_sticker.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It's munching on a green bamboo leaf. The design features bold, clean outlines, simple cel-shading, and a vibrant color palette. The background must be white." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "red_panda_sticker.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It' "'" 's munching on a green bamboo leaf. The design features bold, clean outlines, simple cel-shading, and a vibrant color palette. The background must be white."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > red_panda_sticker.png A kawaii-style sticker of a happy red panda... 3. Accurate text in images Gemini excels at rendering text. Be clear about the text, the font style (descriptively), and the overall design. Template Create a [image type] for [brand/concept] with the text "[text to render]" in a [font style]. The design should be [style description], with a [color scheme]. Prompt Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a a coffee bean seamlessly integrated with the text. The color scheme is black and white. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a a coffee bean seamlessly integrated with the text. The color scheme is black and white." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'logo_example.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a a coffee bean seamlessly integrated with the text. The color scheme is black and white." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "logo_example.png" , buffer ); console . log ( "Image saved as logo_example.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a a coffee bean seamlessly integrated with the text. The color scheme is black and white." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "logo_example.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "Create a modern, minimalist logo for a coffee shop called ' "'" 'The Daily Grind' "'" '. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a a coffee bean seamlessly integrated with the text. The color scheme is black and white."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > logo_example.png Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'... 4. Product mockups & commercial photography Perfect for creating clean, professional product shots for e-commerce, advertising, or branding. Template A high-resolution, studio-lit product photograph of a [product description] on a [background surface/description]. The lighting is a [lighting setup, e.g., three-point softbox setup] to [lighting purpose]. The camera angle is a [angle type] to showcase [specific feature]. Ultra-realistic, with sharp focus on [key detail]. [Aspect ratio]. Prompt A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black, presented on a polished concrete surface. The lighting is a three-point softbox setup designed to create soft, diffused highlights and eliminate harsh shadows. The camera angle is a slightly elevated 45-degree shot to showcase its clean lines. Ultra-realistic, with sharp focus on the steam rising from the coffee. Square image. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black, presented on a polished concrete surface. The lighting is a three-point softbox setup designed to create soft, diffused highlights and eliminate harsh shadows. The camera angle is a slightly elevated 45-degree shot to showcase its clean lines. Ultra-realistic, with sharp focus on the steam rising from the coffee. Square image." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'product_mockup.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black, presented on a polished concrete surface. The lighting is a three-point softbox setup designed to create soft, diffused highlights and eliminate harsh shadows. The camera angle is a slightly elevated 45-degree shot to showcase its clean lines. Ultra-realistic, with sharp focus on the steam rising from the coffee. Square image." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "product_mockup.png" , buffer ); console . log ( "Image saved as product_mockup.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black, presented on a polished concrete surface. The lighting is a three-point softbox setup designed to create soft, diffused highlights and eliminate harsh shadows. The camera angle is a slightly elevated 45-degree shot to showcase its clean lines. Ultra-realistic, with sharp focus on the steam rising from the coffee. Square image." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "product_mockup.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black, presented on a polished concrete surface. The lighting is a three-point softbox setup designed to create soft, diffused highlights and eliminate harsh shadows. The camera angle is a slightly elevated 45-degree shot to showcase its clean lines. Ultra-realistic, with sharp focus on the steam rising from the coffee. Square image."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > product_mockup.png A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug... 5. Minimalist & negative space design Excellent for creating backgrounds for websites, presentations, or marketing materials where text will be overlaid. Template A minimalist composition featuring a single [subject] positioned in the [bottom-right/top-left/etc.] of the frame. The background is a vast, empty [color] canvas, creating significant negative space. Soft, subtle lighting. [Aspect ratio]. Prompt A minimalist composition featuring a single, delicate red maple leaf positioned in the bottom-right of the frame. The background is a vast, empty off-white canvas, creating significant negative space for text. Soft, diffused lighting from the top left. Square image. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "A minimalist composition featuring a single, delicate red maple leaf positioned in the bottom-right of the frame. The background is a vast, empty off-white canvas, creating significant negative space for text. Soft, diffused lighting from the top left. Square image." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'minimalist_design.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "A minimalist composition featuring a single, delicate red maple leaf positioned in the bottom-right of the frame. The background is a vast, empty off-white canvas, creating significant negative space for text. Soft, diffused lighting from the top left. Square image." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "minimalist_design.png" , buffer ); console . log ( "Image saved as minimalist_design.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "A minimalist composition featuring a single, delicate red maple leaf positioned in the bottom-right of the frame. The background is a vast, empty off-white canvas, creating significant negative space for text. Soft, diffused lighting from the top left. Square image." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "minimalist_design.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "A minimalist composition featuring a single, delicate red maple leaf positioned in the bottom-right of the frame. The background is a vast, empty off-white canvas, creating significant negative space for text. Soft, diffused lighting from the top left. Square image."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > minimalist_design.png A minimalist composition featuring a single, delicate red maple leaf... 6. Sequential art (Comic panel / Storyboard) Builds on character consistency and scene description to create panels for visual storytelling. Template A single comic book panel in a [art style] style. In the foreground, [character description and action]. In the background, [setting details]. The panel has a [dialogue/caption box] with the text "[Text]". The lighting creates a [mood] mood. [Aspect ratio]. Prompt A single comic book panel in a gritty, noir art style with high-contrast black and white inks. In the foreground, a detective in a trench coat stands under a flickering streetlamp, rain soaking his shoulders. In the background, the neon sign of a desolate bar reflects in a puddle. A caption box at the top reads "The city was a tough place to keep secrets." The lighting is harsh, creating a dramatic, somber mood. Landscape. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "A single comic book panel in a gritty, noir art style with high-contrast black and white inks. In the foreground, a detective in a trench coat stands under a flickering streetlamp, rain soaking his shoulders. In the background, the neon sign of a desolate bar reflects in a puddle. A caption box at the top reads \" The city was a tough place to keep secrets. \" The lighting is harsh, creating a dramatic, somber mood. Landscape." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'comic_panel.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "A single comic book panel in a gritty, noir art style with high-contrast black and white inks. In the foreground, a detective in a trench coat stands under a flickering streetlamp, rain soaking his shoulders. In the background, the neon sign of a desolate bar reflects in a puddle. A caption box at the top reads \"The city was a tough place to keep secrets.\" The lighting is harsh, creating a dramatic, somber mood. Landscape." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "comic_panel.png" , buffer ); console . log ( "Image saved as comic_panel.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "A single comic book panel in a gritty, noir art style with high-contrast black and white inks. In the foreground, a detective in a trench coat stands under a flickering streetlamp, rain soaking his shoulders. In the background, the neon sign of a desolate bar reflects in a puddle. A caption box at the top reads \"The city was a tough place to keep secrets.\" The lighting is harsh, creating a dramatic, somber mood. Landscape." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "comic_panel.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "A single comic book panel in a gritty, noir art style with high-contrast black and white inks. In the foreground, a detective in a trench coat stands under a flickering streetlamp, rain soaking his shoulders. In the background, the neon sign of a desolate bar reflects in a puddle. A caption box at the top reads \"The city was a tough place to keep secrets.\" The lighting is harsh, creating a dramatic, somber mood. Landscape."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > comic_panel.png A single comic book panel in a gritty, noir art style... Prompts for editing images These examples show how to provide images alongside your text prompts for editing, composition, and style transfer. 1. Adding and removing elements Provide an image and describe your change. The model will match the original image's style, lighting, and perspective. Template Using the provided image of [subject], please [add/remove/modify] [element] to/from the scene. Ensure the change is [description of how the change should integrate]. Prompt "Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and matches the soft lighting of the photo." Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Base image prompt: "A photorealistic picture of a fluffy ginger cat sitting on a wooden floor, looking directly at the camera. Soft, natural light from a window." image_input = Image . open ( '/path/to/your/cat_photo.png' ) text_input = """Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and not falling off.""" # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ text_input , image_input ], ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'cat_with_hat.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath = "/path/to/your/cat_photo.png" ; const imageData = fs . readFileSync ( imagePath ); const base64Image = imageData . toString ( "base64" ); const prompt = [ { text : "Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and not falling off." }, { inlineData : { mimeType : "image/png" , data : base64Image , }, }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "cat_with_hat.png" , buffer ); console . log ( "Image saved as cat_with_hat.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imagePath := "/path/to/your/cat_photo.png" imgData , _ := os . ReadFile ( imagePath ) parts := [] * genai . Part { genai . NewPartFromText ( "Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and not falling off." ), & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData , }, }, } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "cat_with_hat.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH = /path/to/your/cat_photo.png if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ {\"text\": \"Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and not falling off.\"}, { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG_BASE64 \" } } ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > cat_with_hat.png Input Output A photorealistic picture of a fluffy ginger cat... Using the provided image of my cat, please add a small, knitted wizard hat... 2. Inpainting (Semantic masking) Conversationally define a "mask" to edit a specific part of an image while leaving the rest untouched. Template Using the provided image, change only the [specific element] to [new element/description]. Keep everything else in the image exactly the same, preserving the original style, lighting, and composition. Prompt "Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa. Keep the rest of the room, including the pillows on the sofa and the lighting, unchanged." Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Base image prompt: "A wide shot of a modern, well-lit living room with a prominent blue sofa in the center. A coffee table is in front of it and a large window is in the background." living_room_image = Image . open ( '/path/to/your/living_room.png' ) text_input = """Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa. Keep the rest of the room, including the pillows on the sofa and the lighting, unchanged.""" # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ living_room_image , text_input ], ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'living_room_edited.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath = "/path/to/your/living_room.png" ; const imageData = fs . readFileSync ( imagePath ); const base64Image = imageData . toString ( "base64" ); const prompt = [ { inlineData : { mimeType : "image/png" , data : base64Image , }, }, { text : "Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa. Keep the rest of the room, including the pillows on the sofa and the lighting, unchanged." }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "living_room_edited.png" , buffer ); console . log ( "Image saved as living_room_edited.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imagePath := "/path/to/your/living_room.png" imgData , _ := os . ReadFile ( imagePath ) parts := [] * genai . Part { & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData , }, }, genai . NewPartFromText ( "Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa. Keep the rest of the room, including the pillows on the sofa and the lighting, unchanged." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "living_room_edited.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH = /path/to/your/living_room.png if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG_BASE64 \" } }, {\"text\": \"Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa. Keep the rest of the room, including the pillows on the sofa and the lighting, unchanged.\"} ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > living_room_edited.png Input Output A wide shot of a modern, well-lit living room... Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa... 3. Style transfer Provide an image and ask the model to recreate its content in a different artistic style. Template Transform the provided photograph of [subject] into the artistic style of [artist/art style]. Preserve the original composition but render it with [description of stylistic elements]. Prompt "Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows." Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Base image prompt: "A photorealistic, high-resolution photograph of a busy city street in New York at night, with bright neon signs, yellow taxis, and tall skyscrapers." city_image = Image . open ( '/path/to/your/city.png' ) text_input = """Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows.""" # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ city_image , text_input ], ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'city_style_transfer.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath = "/path/to/your/city.png" ; const imageData = fs . readFileSync ( imagePath ); const base64Image = imageData . toString ( "base64" ); const prompt = [ { inlineData : { mimeType : "image/png" , data : base64Image , }, }, { text : "Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows." }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "city_style_transfer.png" , buffer ); console . log ( "Image saved as city_style_transfer.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imagePath := "/path/to/your/city.png" imgData , _ := os . ReadFile ( imagePath ) parts := [] * genai . Part { & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData , }, }, genai . NewPartFromText ( "Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "city_style_transfer.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH = /path/to/your/city.png if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG_BASE64 \" } }, {\"text\": \"Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows.\"} ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > city_style_transfer.png Input Output A photorealistic, high-resolution photograph of a busy city street... Transform the provided photograph of a modern city street at night... 4. Advanced composition: Combining multiple images Provide multiple images as context to create a new, composite scene. This is perfect for product mockups or creative collages. Template Create a new image by combining the elements from the provided images. Take the [element from image 1] and place it with/on the [element from image 2]. The final image should be a [description of the final scene]. Prompt "Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment." Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Base image prompts: # 1. Dress: "A professionally shot photo of a blue floral summer dress on a plain white background, ghost mannequin style." # 2. Model: "Full-body shot of a woman with her hair in a bun, smiling, standing against a neutral grey studio background." dress_image = Image . open ( '/path/to/your/dress.png' ) model_image = Image . open ( '/path/to/your/model.png' ) text_input = """Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment.""" # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ dress_image , model_image , text_input ], ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'fashion_ecommerce_shot.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath1 = "/path/to/your/dress.png" ; const imageData1 = fs . readFileSync ( imagePath1 ); const base64Image1 = imageData1 . toString ( "base64" ); const imagePath2 = "/path/to/your/model.png" ; const imageData2 = fs . readFileSync ( imagePath2 ); const base64Image2 = imageData2 . toString ( "base64" ); const prompt = [ { inlineData : { mimeType : "image/png" , data : base64Image1 , }, }, { inlineData : { mimeType : "image/png" , data : base64Image2 , }, }, { text : "Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment." }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "fashion_ecommerce_shot.png" , buffer ); console . log ( "Image saved as fashion_ecommerce_shot.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imgData1 , _ := os . ReadFile ( "/path/to/your/dress.png" ) imgData2 , _ := os . ReadFile ( "/path/to/your/model.png" ) parts := [] * genai . Part { & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData1 , }, }, & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData2 , }, }, genai . NewPartFromText ( "Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "fashion_ecommerce_shot.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH1 = /path/to/your/dress.png IMG_PATH2 = /path/to/your/model.png if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG1_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH1 " 2>&1 ) IMG2_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH2 " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG1_BASE64 \" } }, { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG2_BASE64 \" } }, {\"text\": \"Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment.\"} ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > fashion_ecommerce_shot.png Input 1 Input 2 Output A professionally shot photo of a blue floral summer dress... Full-body shot of a woman with her hair in a bun... Create a professional e-commerce fashion photo... 5. High-fidelity detail preservation To ensure critical details (like a face or logo) are preserved during an edit, describe them in great detail along with your edit request. Template Using the provided images, place [element from image 2] onto [element from image 1]. Ensure that the features of [element from image 1] remain completely unchanged. The added element should [description of how the element should integrate]. Prompt "Take the first image of the woman with brown hair, blue eyes, and a neutral expression. Add the logo from the second image onto her black t-shirt. Ensure the woman's face and features remain completely unchanged. The logo should look like it's naturally printed on the fabric, following the folds of the shirt." Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Base image prompts: # 1. Woman: "A professional headshot of a woman with brown hair and blue eyes, wearing a plain black t-shirt, against a neutral studio background." # 2. Logo: "A simple, modern logo with the letters 'G' and 'A' in a white circle." woman_image = Image . open ( '/path/to/your/woman.png' ) logo_image = Image . open ( '/path/to/your/logo.png' ) text_input = """Take the first image of the woman with brown hair, blue eyes, and a neutral expression. Add the logo from the second image onto her black t-shirt. Ensure the woman's face and features remain completely unchanged. The logo should look like it's naturally printed on the fabric, following the folds of the shirt.""" # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ woman_image , logo_image , text_input ], ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'woman_with_logo.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath1 = "/path/to/your/woman.png" ; const imageData1 = fs . readFileSync ( imagePath1 ); const base64Image1 = imageData1 . toString ( "base64" ); const imagePath2 = "/path/to/your/logo.png" ; const imageData2 = fs . readFileSync ( imagePath2 ); const base64Image2 = imageData2 . toString ( "base64" ); const prompt = [ { inlineData : { mimeType : "image/png" , data : base64Image1 , }, }, { inlineData : { mimeType : "image/png" , data : base64Image2 , }, }, { text : "Take the first image of the woman with brown hair, blue eyes, and a neutral expression. Add the logo from the second image onto her black t-shirt. Ensure the woman's face and features remain completely unchanged. The logo should look like it's naturally printed on the fabric, following the folds of the shirt." }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "woman_with_logo.png" , buffer ); console . log ( "Image saved as woman_with_logo.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imgData1 , _ := os . ReadFile ( "/path/to/your/woman.png" ) imgData2 , _ := os . ReadFile ( "/path/to/your/logo.png" ) parts := [] * genai . Part { & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData1 , }, }, & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData2 , }, }, genai . NewPartFromText ( "Take the first image of the woman with brown hair, blue eyes, and a neutral expression. Add the logo from the second image onto her black t-shirt. Ensure the woman's face and features remain completely unchanged. The logo should look like it's naturally printed on the fabric, following the folds of the shirt." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "woman_with_logo.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH1 = /path/to/your/woman.png IMG_PATH2 = /path/to/your/logo.png if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG1_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH1 " 2>&1 ) IMG2_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH2 " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG1_BASE64 \" } }, { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG2_BASE64 \" } }, {\"text\": \"Take the first image of the woman with brown hair, blue eyes, and a neutral expression. Add the logo from the second image onto her black t-shirt. Ensure the woman's face and features remain completely unchanged. The logo should look like it's naturally printed on the fabric, following the folds of the shirt.\"} ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > woman_with_logo.png Input 1 Input 2 Output A professional headshot of a woman with brown hair and blue eyes... A simple, modern logo with the letters 'G' and 'A'... Take the first image of the woman with brown hair, blue eyes, and a neutral expression... Best Practices To elevate your results from good to great, incorporate these professional strategies into your workflow. Be Hyper-Specific: The more detail you provide, the more control you have. Instead of "fantasy armor," describe it: "ornate elven plate armor, etched with silver leaf patterns, with a high collar and pauldrons shaped like falcon wings." Provide Context and Intent: Explain the purpose of the image. The model's understanding of context will influence the final output. For example, "Create a logo for a high-end, minimalist skincare brand" will yield better results than just "Create a logo." Iterate and Refine: Don't expect a perfect image on the first try. Use the conversational nature of the model to make small changes. Follow up with prompts like, "That's great, but can you make the lighting a bit warmer?" or "Keep everything the same, but change the character's expression to be more serious." Use Step-by-Step Instructions: For complex scenes with many elements, break your prompt into steps. "First, create a background of a serene, misty forest at dawn. Then, in the foreground, add a moss-covered ancient stone altar. Finally, place a single, glowing sword on top of the altar." Use "Semantic Negative Prompts": Instead of saying "no cars," describe the desired scene positively: "an empty, deserted street with no signs of traffic." Control the Camera: Use photographic and cinematic language to control the composition. Terms like wide-angle shot , macro shot , low-angle perspective . Limitations For best performance, use the following languages: EN, es-MX, ja-JP, zh-CN, hi-IN. Image generation does not support audio or video inputs. The model won't always follow the exact number of image outputs that the user explicitly asked for. The model works best with up to 3 images as an input. When generating text for an image, Gemini works best if you first generate the text and then ask for an image with the text. Uploading images of children is not currently supported in EEA, CH, and UK. All generated images include a SynthID watermark . When to use Imagen In addition to using Gemini's built-in image generation capabilities, you can also access Imagen , our specialized image generation model, through the Gemini API. Attribute Imagen Gemini Native Image Strengths Most capable image generation model to date. Recommended for photorealistic images, sharper clarity, improved spelling and typography. Default recommendation. Unparalleled flexibility, contextual understanding, and simple, mask-free editing. Uniquely capable of multi-turn conversational editing. Availability Generally available Preview (Production usage allowed) Latency Low . Optimized for near-real-time performance. Higher. More computation is required for its advanced capabilities. Cost Cost-effective for specialized tasks. $0.02/image to $0.12/image Token-based pricing. $30 per 1 million tokens for image output (image output tokenized at 1290 tokens per image flat, up to 1024x1024px) Recommended tasks Image quality, photorealism, artistic detail, or specific styles (e.g., impressionism, anime) are top priorities. Infusing branding, style, or generating logos and product designs. Generating advanced spelling or typography. Interleaved text and image generation to seamlessly blend text and images. Combine creative elements from multiple images with a single prompt. Make highly specific edits to images, modify individual elements with simple language commands, and iteratively work on an image. Apply a specific design or texture from one image to another while preserving the original subject's form and details. Imagen 4 should be your go-to model starting to generate images with Imagen. Choose Imagen 4 Ultra for advanced use-cases or when you need the best image quality (note that can only generate one image at a time). What's next Find more examples and code samples in the cookbook guide . Check out the Veo guide to learn how to generate videos with the Gemini API. To learn more about Gemini models, see Gemini models . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-09-08 UTC.
text_content/docs_image-generation_46bf3e9e.txt CHANGED
@@ -1,5 +1,5 @@
1
  URL: https://ai.google.dev/gemini-api/docs/image-generation#main-content
2
- Title: Image generation  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Image generation | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Image generation You can generate images using the Gemini API with either Gemini's built-in multimodal capabilities or Imagen, Google's specialized image generation models. For most use cases, start with Gemini . Choose Imagen for specialized tasks where image quality is critical. See Choosing the right model section for more guidance. All generated images include a SynthID watermark . Before you begin Ensure you use a supported model and version for image generation: For Gemini , use Gemini 2.0 Flash Preview Image Generation. For Imagen , use one of the Imagen models (Imagen 3, Imagen 4 or Imagen 4 Ultra). Note that those models are only available on the Paid tier . You can access both Gemini and Imagen models using the same libraries. Note: Image generation may not be available in all regions and countries, review our Models page for more information. Generate images using Gemini Gemini can generate and process images conversationally. You can prompt Gemini with text, images, or a combination of both to achieve various image-related tasks, such as image generation and editing. You must include responseModalities : ["TEXT", "IMAGE"] in your configuration. Image-only output is not supported with these models. Image generation (text-to-image) The following code demonstrates how to generate an image based on a descriptive prompt: Python from google import genai from google.genai import types from PIL import Image from io import BytesIO import base64 client = genai . Client () contents = ( 'Hi, can you create a 3d rendered image of a pig ' 'with wings and a top hat flying over a happy ' 'futuristic scifi city with lots of greenery?' ) response = client . models . generate_content ( model = "gemini-2.0-flash-preview-image-generation" , contents = contents , config = types . GenerateContentConfig ( response_modalities = [ 'TEXT' , 'IMAGE' ] ) ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) elif part . inline_data is not None : image = Image . open ( BytesIO (( part . inline_data . data ))) image . save ( 'gemini-native-image.png' ) image . show () JavaScript Note: We've released the Google SDK for TypeScript and JavaScript in preview launch stage . Use this SDK for image generation features. import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const contents = "Hi, can you create a 3d rendered image of a pig " + "with wings and a top hat flying over a happy " + "futuristic scifi city with lots of greenery?" ; // Set responseModalities to include "Image" so the model can generate an image const response = await ai . models . generateContent ({ model : "gemini-2.0-flash-preview-image-generation" , contents : contents , config : { responseModalities : [ Modality . TEXT , Modality . IMAGE ], }, }); for ( const part of response . candidates [ 0 ]. content . parts ) { // Based on the part type, either show the text or save the image if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "gemini-native-image.png" , buffer ); console . log ( "Image saved as gemini-native-image.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } config := & genai . GenerateContentConfig { ResponseModalities : [] string { "TEXT" , "IMAGE" }, } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.0-flash-preview-image-generation" , genai . Text ( "Hi, can you create a 3d rendered image of a pig " + "with wings and a top hat flying over a happy " + "futuristic scifi city with lots of greenery?" ), config , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "gemini_generated_image.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "Hi, can you create a 3d rendered image of a pig with wings and a top hat flying over a happy futuristic scifi city with lots of greenery?"} ] }], "generationConfig":{"responseModalities":["TEXT","IMAGE"]} }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > gemini-native-image.png AI-generated image of a fantastical flying pig Image editing (text-and-image-to-image) To perform image editing, add an image as input. The following example demonstrates uploading base64 encoded images. For multiple images and larger payloads, check the image input section. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO import PIL.Image image = PIL . Image . open ( '/path/to/image.png' ) client = genai . Client () text_input = ( 'Hi, This is a picture of me.' 'Can you add a llama next to me?' ,) response = client . models . generate_content ( model = "gemini-2.0-flash-preview-image-generation" , contents = [ text_input , image ], config = types . GenerateContentConfig ( response_modalities = [ 'TEXT' , 'IMAGE' ] ) ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) elif part . inline_data is not None : image = Image . open ( BytesIO (( part . inline_data . data ))) image . show () JavaScript Note: We've released the Google SDK for TypeScript and JavaScript in preview launch stage . Use this SDK for image generation features. import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); // Load the image from the local file system const imagePath = "path/to/image.png" ; const imageData = fs . readFileSync ( imagePath ); const base64Image = imageData . toString ( "base64" ); // Prepare the content parts const contents = [ { text : "Can you add a llama next to the image?" }, { inlineData : { mimeType : "image/png" , data : base64Image , }, }, ]; // Set responseModalities to include "Image" so the model can generate an image const response = await ai . models . generateContent ({ model : "gemini-2.0-flash-preview-image-generation" , contents : contents , config : { responseModalities : [ Modality . TEXT , Modality . IMAGE ], }, }); for ( const part of response . candidates [ 0 ]. content . parts ) { // Based on the part type, either show the text or save the image if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "gemini-native-image.png" , buffer ); console . log ( "Image saved as gemini-native-image.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imagePath := "/path/to/image.png" imgData , _ := os . ReadFile ( imagePath ) parts := [] * genai . Part { genai . NewPartFromText ( "Hi, This is a picture of me. Can you add a llama next to me?" ), & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData , }, }, } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } config := & genai . GenerateContentConfig { ResponseModalities : [] string { "TEXT" , "IMAGE" }, } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.0-flash-preview-image-generation" , contents , config , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "gemini_generated_image.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH = /path/to/your/image1.jpeg if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ {\"text\": \"'Hi, This is a picture of me. Can you add a llama next to me\"}, { \"inline_data\": { \"mime_type\":\"image/jpeg\", \"data\": \" $IMG_BASE64 \" } } ] }], \"generationConfig\": {\"responseModalities\": [\"TEXT\", \"IMAGE\"]} }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > gemini-edited-image.png Other image generation modes Gemini supports other image interaction modes based on prompt structure and context, including: Text to image(s) and text (interleaved): Outputs images with related text. Example prompt: "Generate an illustrated recipe for a paella." Image(s) and text to image(s) and text (interleaved) : Uses input images and text to create new related images and text. Example prompt: (With an image of a furnished room) "What other color sofas would work in my space? can you update the image?" Multi-turn image editing (chat): Keep generating / editing images conversationally. Example prompts: [upload an image of a blue car.] , "Turn this car into a convertible.", "Now change the color to yellow." Limitations For best performance, use the following languages: EN, es-MX, ja-JP, zh-CN, hi-IN. Image generation does not support audio or video inputs. Image generation may not always trigger: The model may output text only. Try asking for image outputs explicitly (e.g. "generate an image", "provide images as you go along", "update the image"). The model may stop generating partway through. Try again or try a different prompt. When generating text for an image, Gemini works best if you first generate the text and then ask for an image with the text. There are some regions/countries where Image generation is not available. See Models for more information. Generate images using the Imagen models This example demonstrates generating images with an Imagen model : Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () response = client . models . generate_images ( model = 'imagen-4.0-generate-preview-06-06' , prompt = 'Robot holding a red skateboard' , config = types . GenerateImagesConfig ( number_of_images = 4 , ) ) for generated_image in response . generated_images : generated_image . image . show () JavaScript import { GoogleGenAI } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const response = await ai . models . generateImages ({ model : 'imagen-4.0-generate-preview-06-06' , prompt : 'Robot holding a red skateboard' , config : { numberOfImages : 4 , }, }); let idx = 1 ; for ( const generatedImage of response . generatedImages ) { let imgBytes = generatedImage . image . imageBytes ; const buffer = Buffer . from ( imgBytes , "base64" ); fs . writeFileSync ( `imagen- ${ idx } .png` , buffer ); idx ++ ; } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } config := & genai . GenerateImagesConfig { NumberOfImages : 4 , } response , _ := client . Models . GenerateImages ( ctx , "imagen-4.0-generate-preview-06-06" , "Robot holding a red skateboard" , config , ) for n , image := range response . GeneratedImages { fname := fmt . Sprintf ( "imagen-%d.png" , n ) _ = os . WriteFile ( fname , image . Image . ImageBytes , 0644 ) } } REST curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/imagen-4.0-generate-preview-06-06:predict" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "instances": [ { "prompt": "Robot holding a red skateboard" } ], "parameters": { "sampleCount": 4 } }' AI-generated image of a robot holding a red skateboard Imagen configuration Imagen supports English only prompts at this time and the following parameters: Note: Naming conventions of parameters vary by programming language. numberOfImages : The number of images to generate, from 1 to 4 (inclusive). The default is 4. For Imagen 4 Ultra, it defaults to 1 as only one image can be generated at a time. aspectRatio : Changes the aspect ratio of the generated image. Supported values are "1:1" , "3:4" , "4:3" , "9:16" , and "16:9" . The default is "1:1" . personGeneration : Allow the model to generate images of people. The following values are supported: "dont_allow" : Block generation of images of people. "allow_adult" : Generate images of adults, but not children. This is the default. "allow_all" : Generate images that include adults and children. Note: The "allow_all" parameter value is not allowed in EU, UK, CH, MENA locations. Choosing the right model Choose Gemini when: You need contextually relevant images that leverage world knowledge and reasoning. Seamlessly blending text and images is important. You want accurate visuals embedded within long text sequences. You want to edit images conversationally while maintaining context. Choose Imagen when: Image quality, photorealism, artistic detail, or specific styles (e.g., impressionism, anime) are top priorities. Performing specialized editing tasks like product background updates or image upscaling. Infusing branding, style, or generating logos and product designs. Imagen 4 should be your go-to model starting to generate images with Imagen. Choose Imagen 4 Ultra for advanced use-cases or when you need the best image quality. Note that Imagen 4 Ultra can only generate one image at a time. Imagen prompt guide This section of the Imagen guide shows you how modifying a text-to-image prompt can produce different results, along with examples of images you can create. Prompt writing basics Note: Maximum prompt length is 480 tokens. A good prompt is descriptive and clear, and makes use of meaningful keywords and modifiers. Start by thinking of your subject , context , and style . Image text: A sketch ( style ) of a modern apartment building ( subject ) surrounded by skyscrapers ( context and background ). Subject : The first thing to think about with any prompt is the subject : the object, person, animal, or scenery you want an image of. Context and background: Just as important is the background or context in which the subject will be placed. Try placing your subject in a variety of backgrounds. For example, a studio with a white background, outdoors, or indoor environments. Style: Finally, add the style of image you want. Styles can be general (painting, photograph, sketches) or very specific (pastel painting, charcoal drawing, isometric 3D). You can also combine styles. After you write a first version of your prompt, refine your prompt by adding more details until you get to the image that you want. Iteration is important. Start by establishing your core idea, and then refine and expand upon that core idea until the generated image is close to your vision. Prompt: A park in the spring next to a lake Prompt: A park in the spring next to a lake, the sun sets across the lake, golden hour Prompt: A park in the spring next to a lake, the sun sets across the lake, golden hour, red wildflowers Imagen models can transform your ideas into detailed images, whether your prompts are short or long and detailed. Refine your vision through iterative prompting, adding details until you achieve the perfect result. Short prompts let you generate an image quickly. Prompt: close-up photo of a woman in her 20s, street photography, movie still, muted orange warm tones Longer prompts let you add specific details and build your image. Prompt: captivating photo of a woman in her 20s utilizing a street photography style. The image should look like a movie still with muted orange warm tones. Additional advice for Imagen prompt writing: Use descriptive language : Employ detailed adjectives and adverbs to paint a clear picture for Imagen. Provide context : If necessary, include background information to aid the AI's understanding. Reference specific artists or styles : If you have a particular aesthetic in mind, referencing specific artists or art movements can be helpful. Use prompt engineering tools : Consider exploring prompt engineering tools or resources to help you refine your prompts and achieve optimal results. Enhancing the facial details in your personal and group images : Specify facial details as a focus of the photo (for example, use the word "portrait" in the prompt). Generate text in images Imagen models can add text into images, opening up more creative image generation possibilities. Use the following guidance to get the most out of this feature: Iterate with confidence : You might have to regenerate images until you achieve the look you want. Imagen's text integration is still evolving, and sometimes multiple attempts yield the best results. Keep it short : Limit text to 25 characters or less for optimal generation. Multiple phrases : Experiment with two or three distinct phrases to provide additional information. Avoid exceeding three phrases for cleaner compositions. Prompt: A poster with the text "Summerland" in bold font as a title, underneath this text is the slogan "Summer never felt so good" Guide Placement : While Imagen can attempt to position text as directed, expect occasional variations. This feature is continually improving. Inspire font style : Specify a general font style to subtly influence Imagen's choices. Don't rely on precise font replication, but expect creative interpretations. Font size : Specify a font size or a general indication of size (for example, small , medium , large ) to influence the font size generation. Prompt parameterization To better control output results, you might find it helpful to parameterize the inputs into Imagen. For example, suppose you want your customers to be able to generate logos for their business, and you want to make sure logos are always generated on a solid color background. You also want to limit the options that the client can select from a menu. In this example, you can create a parameterized prompt similar to the following: A {logo_style} logo for a {company_area} company on a solid color background. Include the text {company_name} . In your custom user interface, the customer can input the parameters using a menu, and their chosen value populates the prompt Imagen receives. For example: Prompt: A minimalist logo for a health care company on a solid color background. Include the text Journey . Prompt: A modern logo for a software company on a solid color background. Include the text Silo . Prompt: A traditional logo for a baking company on a solid color background. Include the text Seed . Advanced prompt writing techniques Use the following examples to create more specific prompts based on attributes like photography descriptors, shapes and materials, historical art movements, and image quality modifiers. Photography Prompt includes: "A photo of..." To use this style, start with using keywords that clearly tell Imagen that you're looking for a photograph. Start your prompts with "A photo of. . ." . For example: Prompt: A photo of coffee beans in a kitchen on a wooden surface Prompt: A photo of a chocolate bar on a kitchen counter Prompt: A photo of a modern building with water in the background Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Photography modifiers In the following examples, you can see several photography-specific modifiers and parameters. You can combine multiple modifiers for more precise control. Camera Proximity - Close up, taken from far away Prompt: A close-up photo of coffee beans Prompt: A zoomed out photo of a small bag of coffee beans in a messy kitchen Camera Position - aerial, from below Prompt: aerial photo of urban city with skyscrapers Prompt: A photo of a forest canopy with blue skies from below Lighting - natural, dramatic, warm, cold Prompt: studio photo of a modern arm chair, natural lighting Prompt: studio photo of a modern arm chair, dramatic lighting Camera Settings - motion blur, soft focus, bokeh, portrait Prompt: photo of a city with skyscrapers from the inside of a car with motion blur Prompt: soft focus photograph of a bridge in an urban city at night Lens types - 35mm, 50mm, fisheye, wide angle, macro Prompt: photo of a leaf, macro lens Prompt: street photography, new york city, fisheye lens Film types - black and white, polaroid Prompt: a polaroid portrait of a dog wearing sunglasses Prompt: black and white photo of a dog wearing sunglasses Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Illustration and art Prompt includes: "A painting of..." , "A sketch of..." Art styles vary from monochrome styles like pencil sketches, to hyper-realistic digital art. For example, the following images use the same prompt with different styles: "An [art style or creation technique] of an angular sporty electric sedan with skyscrapers in the background" Prompt: A technical pencil drawing of an angular... Prompt: A charcoal drawing of an angular... Prompt: A color pencil drawing of an angular... Prompt: A pastel painting of an angular... Prompt: A digital art of an angular... Prompt: An art deco (poster) of an angular... Image source: Each image was generated using its corresponding text prompt with the Imagen 2 model. Shapes and materials Prompt includes: "...made of..." , "...in the shape of..." One of the strengths of this technology is that you can create imagery that is otherwise difficult or impossible. For example, you can recreate your company logo in different materials and textures. Prompt: a duffle bag made of cheese Prompt: neon tubes in the shape of a bird Prompt: an armchair made of paper , studio photo, origami style Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Historical art references Prompt includes: "...in the style of..." Certain styles have become iconic over the years. The following are some ideas of historical painting or art styles that you can try. "generate an image in the style of [art period or movement] : a wind farm" Prompt: generate an image in the style of an impressionist painting : a wind farm Prompt: generate an image in the style of a renaissance painting : a wind farm Prompt: generate an image in the style of pop art : a wind farm Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Image quality modifiers Certain keywords can let the model know that you're looking for a high-quality asset. Examples of quality modifiers include the following: General Modifiers - high-quality, beautiful, stylized Photos - 4K, HDR, Studio Photo Art, Illustration - by a professional, detailed The following are a few examples of prompts without quality modifiers and the same prompt with quality modifiers. Prompt (no quality modifiers): a photo of a corn stalk Prompt (with quality modifiers): 4k HDR beautiful photo of a corn stalk taken by a professional photographer Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Aspect ratios Imagen image generation lets you set five distinct image aspect ratios. Square (1:1, default) - A standard square photo. Common uses for this aspect ratio include social media posts. Fullscreen (4:3) - This aspect ratio is commonly used in media or film. It is also the dimensions of most old (non-widescreen) TVs and medium format cameras. It captures more of the scene horizontally (compared to 1:1), making it a preferred aspect ratio for photography. Prompt: close up of a musician's fingers playing the piano, black and white film, vintage (4:3 aspect ratio) Prompt: A professional studio photo of french fries for a high end restaurant, in the style of a food magazine (4:3 aspect ratio) Portrait full screen (3:4) - This is the fullscreen aspect ratio rotated 90 degrees. This lets to capture more of the scene vertically compared to the 1:1 aspect ratio. Prompt: a woman hiking, close of her boots reflected in a puddle, large mountains in the background, in the style of an advertisement, dramatic angles (3:4 aspect ratio) Prompt: aerial shot of a river flowing up a mystical valley (3:4 aspect ratio) Widescreen (16:9) - This ratio has replaced 4:3 and is now the most common aspect ratio for TVs, monitors, and mobile phone screens (landscape). Use this aspect ratio when you want to capture more of the background (for example, scenic landscapes). Prompt: a man wearing all white clothing sitting on the beach, close up, golden hour lighting (16:9 aspect ratio) Portrait (9:16) - This ratio is widescreen but rotated. This a relatively new aspect ratio that has been popularized by short form video apps (for example, YouTube shorts). Use this for tall objects with strong vertical orientations such as buildings, trees, waterfalls, or other similar objects. Prompt: a digital render of a massive skyscraper, modern, grand, epic with a beautiful sunset in the background (9:16 aspect ratio) Photorealistic images Different versions of the image generation model might offer a mix of artistic and photorealistic output. Use the following wording in prompts to generate more photorealistic output, based on the subject you want to generate. Note: Take these keywords as general guidance when you try to create photorealistic images. They aren't required to achieve your goal. Use case Lens type Focal lengths Additional details People (portraits) Prime, zoom 24-35mm black and white film, Film noir, Depth of field, duotone (mention two colors) Food, insects, plants (objects, still life) Macro 60-105mm High detail, precise focusing, controlled lighting Sports, wildlife (motion) Telephoto zoom 100-400mm Fast shutter speed, Action or movement tracking Astronomical, landscape (wide-angle) Wide-angle 10-24mm Long exposure times, sharp focus, long exposure, smooth water or clouds Portraits Use case Lens type Focal lengths Additional details People (portraits) Prime, zoom 24-35mm black and white film, Film noir, Depth of field, duotone (mention two colors) Using several keywords from the table, Imagen can generate the following portraits: Prompt: A woman, 35mm portrait, blue and grey duotones Model: imagen-3.0-generate-002 Prompt: A woman, 35mm portrait, film noir Model: imagen-3.0-generate-002 Objects Use case Lens type Focal lengths Additional details Food, insects, plants (objects, still life) Macro 60-105mm High detail, precise focusing, controlled lighting Using several keywords from the table, Imagen can generate the following object images: Prompt: leaf of a prayer plant, macro lens, 60mm Model: imagen-3.0-generate-002 Prompt: a plate of pasta, 100mm Macro lens Model: imagen-3.0-generate-002 Motion Use case Lens type Focal lengths Additional details Sports, wildlife (motion) Telephoto zoom 100-400mm Fast shutter speed, Action or movement tracking Using several keywords from the table, Imagen can generate the following motion images: Prompt: a winning touchdown, fast shutter speed, movement tracking Model: imagen-3.0-generate-002 Prompt: A deer running in the forest, fast shutter speed, movement tracking Model: imagen-3.0-generate-002 Wide-angle Use case Lens type Focal lengths Additional details Astronomical, landscape (wide-angle) Wide-angle 10-24mm Long exposure times, sharp focus, long exposure, smooth water or clouds Using several keywords from the table, Imagen can generate the following wide-angle images: Prompt: an expansive mountain range, landscape wide angle 10mm Model: imagen-3.0-generate-002 Prompt: a photo of the moon, astro photography, wide angle 10mm Model: imagen-3.0-generate-002 What's next Check out the Veo guide to learn how to generate videos with the Gemini API. To learn more about Gemini models, see Gemini models and Experimental models . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
 
1
  URL: https://ai.google.dev/gemini-api/docs/image-generation#main-content
2
+ Title: Image generation with Gemini (aka Nano Banana)  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Image generation with Gemini (aka Nano Banana) | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Image generation with Gemini (aka Nano Banana) Gemini can generate and process images conversationally. You can prompt Gemini with text, images, or a combination of both allowing you to create, edit, and iterate on visuals with unprecedented control: Text-to-Image: Generate high-quality images from simple or complex text descriptions. Image + Text-to-Image (Editing): Provide an image and use text prompts to add, remove, or modify elements, change the style, or adjust the color grading. Multi-Image to Image (Composition & Style Transfer): Use multiple input images to compose a new scene or transfer the style from one image to another. Iterative Refinement: Engage in a conversation to progressively refine your image over multiple turns, making small adjustments until it's perfect. High-Fidelity Text Rendering: Accurately generate images that contain legible and well-placed text, ideal for logos, diagrams, and posters. All generated images include a SynthID watermark . Image generation (text-to-image) The following code demonstrates how to generate an image based on a descriptive prompt. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () prompt = ( "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme" ) response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ prompt ], ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) elif part . inline_data is not None : image = Image . open ( BytesIO ( part . inline_data . data )) image . save ( "generated_image.png" ) JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme" ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "gemini-native-image.png" , buffer ); console . log ( "Image saved as gemini-native-image.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "Create a picture of a nano banana dish in a " + " fancy restaurant with a Gemini theme" ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "gemini_generated_image.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme"} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > gemini-native-image.png AI-generated image of a nano banana dish in a Gemini-themed restaurant Image editing (text-and-image-to-image) Reminder : Make sure you have the necessary rights to any images you upload. Don't generate content that infringe on others' rights, including videos or images that deceive, harass, or harm. Your use of this generative AI service is subject to our Prohibited Use Policy . The following example demonstrates uploading base64 encoded images. For multiple images, larger payloads, and supported MIME types, check the Image understanding page. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () prompt = ( "Create a picture of my cat eating a nano-banana in a " "fancy restaurant under the Gemini constellation" , ) image = Image . open ( "/path/to/cat_image.png" ) response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ prompt , image ], ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) elif part . inline_data is not None : image = Image . open ( BytesIO ( part . inline_data . data )) image . save ( "generated_image.png" ) JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath = "path/to/cat_image.png" ; const imageData = fs . readFileSync ( imagePath ); const base64Image = imageData . toString ( "base64" ); const prompt = [ { text : "Create a picture of my cat eating a nano-banana in a" + "fancy restaurant under the Gemini constellation" }, { inlineData : { mimeType : "image/png" , data : base64Image , }, }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "gemini-native-image.png" , buffer ); console . log ( "Image saved as gemini-native-image.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imagePath := "/path/to/cat_image.png" imgData , _ := os . ReadFile ( imagePath ) parts := [] * genai . Part { genai . NewPartFromText ( "Create a picture of my cat eating a nano-banana in a fancy restaurant under the Gemini constellation" ), & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData , }, }, } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "gemini_generated_image.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH = /path/to/cat_image.jpeg if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ {\"text\": \"'Create a picture of my cat eating a nano-banana in a fancy restaurant under the Gemini constellation\"}, { \"inline_data\": { \"mime_type\":\"image/jpeg\", \"data\": \" $IMG_BASE64 \" } } ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > gemini-edited-image.png AI-generated image of a cat eating a nano banana Other image generation modes Gemini supports other image interaction modes based on prompt structure and context, including: Text to image(s) and text (interleaved): Outputs images with related text. Example prompt: "Generate an illustrated recipe for a paella." Image(s) and text to image(s) and text (interleaved) : Uses input images and text to create new related images and text. Example prompt: (With an image of a furnished room) "What other color sofas would work in my space? can you update the image?" Multi-turn image editing (chat): Keep generating and editing images conversationally. Example prompts: [upload an image of a blue car.] , "Turn this car into a convertible.", "Now change the color to yellow." Prompting guide and strategies Mastering Gemini 2.5 Flash Image Generation starts with one fundamental principle: Describe the scene, don't just list keywords. The model's core strength is its deep language understanding. A narrative, descriptive paragraph will almost always produce a better, more coherent image than a list of disconnected words. Prompts for generating images The following strategies will help you create effective prompts to generate exactly the images you're looking for. 1. Photorealistic scenes For realistic images, use photography terms. Mention camera angles, lens types, lighting, and fine details to guide the model toward a photorealistic result. Template A photorealistic [shot type] of [subject], [action or expression], set in [environment]. The scene is illuminated by [lighting description], creating a [mood] atmosphere. Captured with a [camera/lens details], emphasizing [key textures and details]. The image should be in a [aspect ratio] format. Prompt A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful. Vertical portrait orientation. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop with pottery wheels and shelves of clay pots in the background. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay and the fabric of his apron. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'photorealistic_example.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop with pottery wheels and shelves of clay pots in the background. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay and the fabric of his apron. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "photorealistic_example.png" , buffer ); console . log ( "Image saved as photorealistic_example.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop with pottery wheels and shelves of clay pots in the background. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay and the fabric of his apron. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "photorealistic_example.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop with pottery wheels and shelves of clay pots in the background. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay and the fabric of his apron. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > photorealistic_example.png A photorealistic close-up portrait of an elderly Japanese ceramicist... 2. Stylized illustrations & stickers To create stickers, icons, or assets, be explicit about the style and request a transparent background. Template A [style] sticker of a [subject], featuring [key characteristics] and a [color palette]. The design should have [line style] and [shading style]. The background must be transparent. Prompt A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It's munching on a green bamboo leaf. The design features bold, clean outlines, simple cel-shading, and a vibrant color palette. The background must be white. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It's munching on a green bamboo leaf. The design features bold, clean outlines, simple cel-shading, and a vibrant color palette. The background must be white." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'red_panda_sticker.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It's munching on a green bamboo leaf. The design features bold, clean outlines, simple cel-shading, and a vibrant color palette. The background must be white." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "red_panda_sticker.png" , buffer ); console . log ( "Image saved as red_panda_sticker.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It's munching on a green bamboo leaf. The design features bold, clean outlines, simple cel-shading, and a vibrant color palette. The background must be white." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "red_panda_sticker.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It' "'" 's munching on a green bamboo leaf. The design features bold, clean outlines, simple cel-shading, and a vibrant color palette. The background must be white."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > red_panda_sticker.png A kawaii-style sticker of a happy red panda... 3. Accurate text in images Gemini excels at rendering text. Be clear about the text, the font style (descriptively), and the overall design. Template Create a [image type] for [brand/concept] with the text "[text to render]" in a [font style]. The design should be [style description], with a [color scheme]. Prompt Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a a coffee bean seamlessly integrated with the text. The color scheme is black and white. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a a coffee bean seamlessly integrated with the text. The color scheme is black and white." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'logo_example.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a a coffee bean seamlessly integrated with the text. The color scheme is black and white." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "logo_example.png" , buffer ); console . log ( "Image saved as logo_example.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a a coffee bean seamlessly integrated with the text. The color scheme is black and white." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "logo_example.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "Create a modern, minimalist logo for a coffee shop called ' "'" 'The Daily Grind' "'" '. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a a coffee bean seamlessly integrated with the text. The color scheme is black and white."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > logo_example.png Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'... 4. Product mockups & commercial photography Perfect for creating clean, professional product shots for e-commerce, advertising, or branding. Template A high-resolution, studio-lit product photograph of a [product description] on a [background surface/description]. The lighting is a [lighting setup, e.g., three-point softbox setup] to [lighting purpose]. The camera angle is a [angle type] to showcase [specific feature]. Ultra-realistic, with sharp focus on [key detail]. [Aspect ratio]. Prompt A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black, presented on a polished concrete surface. The lighting is a three-point softbox setup designed to create soft, diffused highlights and eliminate harsh shadows. The camera angle is a slightly elevated 45-degree shot to showcase its clean lines. Ultra-realistic, with sharp focus on the steam rising from the coffee. Square image. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black, presented on a polished concrete surface. The lighting is a three-point softbox setup designed to create soft, diffused highlights and eliminate harsh shadows. The camera angle is a slightly elevated 45-degree shot to showcase its clean lines. Ultra-realistic, with sharp focus on the steam rising from the coffee. Square image." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'product_mockup.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black, presented on a polished concrete surface. The lighting is a three-point softbox setup designed to create soft, diffused highlights and eliminate harsh shadows. The camera angle is a slightly elevated 45-degree shot to showcase its clean lines. Ultra-realistic, with sharp focus on the steam rising from the coffee. Square image." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "product_mockup.png" , buffer ); console . log ( "Image saved as product_mockup.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black, presented on a polished concrete surface. The lighting is a three-point softbox setup designed to create soft, diffused highlights and eliminate harsh shadows. The camera angle is a slightly elevated 45-degree shot to showcase its clean lines. Ultra-realistic, with sharp focus on the steam rising from the coffee. Square image." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "product_mockup.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black, presented on a polished concrete surface. The lighting is a three-point softbox setup designed to create soft, diffused highlights and eliminate harsh shadows. The camera angle is a slightly elevated 45-degree shot to showcase its clean lines. Ultra-realistic, with sharp focus on the steam rising from the coffee. Square image."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > product_mockup.png A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug... 5. Minimalist & negative space design Excellent for creating backgrounds for websites, presentations, or marketing materials where text will be overlaid. Template A minimalist composition featuring a single [subject] positioned in the [bottom-right/top-left/etc.] of the frame. The background is a vast, empty [color] canvas, creating significant negative space. Soft, subtle lighting. [Aspect ratio]. Prompt A minimalist composition featuring a single, delicate red maple leaf positioned in the bottom-right of the frame. The background is a vast, empty off-white canvas, creating significant negative space for text. Soft, diffused lighting from the top left. Square image. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "A minimalist composition featuring a single, delicate red maple leaf positioned in the bottom-right of the frame. The background is a vast, empty off-white canvas, creating significant negative space for text. Soft, diffused lighting from the top left. Square image." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'minimalist_design.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "A minimalist composition featuring a single, delicate red maple leaf positioned in the bottom-right of the frame. The background is a vast, empty off-white canvas, creating significant negative space for text. Soft, diffused lighting from the top left. Square image." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "minimalist_design.png" , buffer ); console . log ( "Image saved as minimalist_design.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "A minimalist composition featuring a single, delicate red maple leaf positioned in the bottom-right of the frame. The background is a vast, empty off-white canvas, creating significant negative space for text. Soft, diffused lighting from the top left. Square image." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "minimalist_design.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "A minimalist composition featuring a single, delicate red maple leaf positioned in the bottom-right of the frame. The background is a vast, empty off-white canvas, creating significant negative space for text. Soft, diffused lighting from the top left. Square image."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > minimalist_design.png A minimalist composition featuring a single, delicate red maple leaf... 6. Sequential art (Comic panel / Storyboard) Builds on character consistency and scene description to create panels for visual storytelling. Template A single comic book panel in a [art style] style. In the foreground, [character description and action]. In the background, [setting details]. The panel has a [dialogue/caption box] with the text "[Text]". The lighting creates a [mood] mood. [Aspect ratio]. Prompt A single comic book panel in a gritty, noir art style with high-contrast black and white inks. In the foreground, a detective in a trench coat stands under a flickering streetlamp, rain soaking his shoulders. In the background, the neon sign of a desolate bar reflects in a puddle. A caption box at the top reads "The city was a tough place to keep secrets." The lighting is harsh, creating a dramatic, somber mood. Landscape. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "A single comic book panel in a gritty, noir art style with high-contrast black and white inks. In the foreground, a detective in a trench coat stands under a flickering streetlamp, rain soaking his shoulders. In the background, the neon sign of a desolate bar reflects in a puddle. A caption box at the top reads \" The city was a tough place to keep secrets. \" The lighting is harsh, creating a dramatic, somber mood. Landscape." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'comic_panel.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "A single comic book panel in a gritty, noir art style with high-contrast black and white inks. In the foreground, a detective in a trench coat stands under a flickering streetlamp, rain soaking his shoulders. In the background, the neon sign of a desolate bar reflects in a puddle. A caption box at the top reads \"The city was a tough place to keep secrets.\" The lighting is harsh, creating a dramatic, somber mood. Landscape." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "comic_panel.png" , buffer ); console . log ( "Image saved as comic_panel.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "A single comic book panel in a gritty, noir art style with high-contrast black and white inks. In the foreground, a detective in a trench coat stands under a flickering streetlamp, rain soaking his shoulders. In the background, the neon sign of a desolate bar reflects in a puddle. A caption box at the top reads \"The city was a tough place to keep secrets.\" The lighting is harsh, creating a dramatic, somber mood. Landscape." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "comic_panel.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "A single comic book panel in a gritty, noir art style with high-contrast black and white inks. In the foreground, a detective in a trench coat stands under a flickering streetlamp, rain soaking his shoulders. In the background, the neon sign of a desolate bar reflects in a puddle. A caption box at the top reads \"The city was a tough place to keep secrets.\" The lighting is harsh, creating a dramatic, somber mood. Landscape."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > comic_panel.png A single comic book panel in a gritty, noir art style... Prompts for editing images These examples show how to provide images alongside your text prompts for editing, composition, and style transfer. 1. Adding and removing elements Provide an image and describe your change. The model will match the original image's style, lighting, and perspective. Template Using the provided image of [subject], please [add/remove/modify] [element] to/from the scene. Ensure the change is [description of how the change should integrate]. Prompt "Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and matches the soft lighting of the photo." Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Base image prompt: "A photorealistic picture of a fluffy ginger cat sitting on a wooden floor, looking directly at the camera. Soft, natural light from a window." image_input = Image . open ( '/path/to/your/cat_photo.png' ) text_input = """Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and not falling off.""" # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ text_input , image_input ], ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'cat_with_hat.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath = "/path/to/your/cat_photo.png" ; const imageData = fs . readFileSync ( imagePath ); const base64Image = imageData . toString ( "base64" ); const prompt = [ { text : "Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and not falling off." }, { inlineData : { mimeType : "image/png" , data : base64Image , }, }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "cat_with_hat.png" , buffer ); console . log ( "Image saved as cat_with_hat.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imagePath := "/path/to/your/cat_photo.png" imgData , _ := os . ReadFile ( imagePath ) parts := [] * genai . Part { genai . NewPartFromText ( "Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and not falling off." ), & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData , }, }, } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "cat_with_hat.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH = /path/to/your/cat_photo.png if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ {\"text\": \"Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and not falling off.\"}, { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG_BASE64 \" } } ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > cat_with_hat.png Input Output A photorealistic picture of a fluffy ginger cat... Using the provided image of my cat, please add a small, knitted wizard hat... 2. Inpainting (Semantic masking) Conversationally define a "mask" to edit a specific part of an image while leaving the rest untouched. Template Using the provided image, change only the [specific element] to [new element/description]. Keep everything else in the image exactly the same, preserving the original style, lighting, and composition. Prompt "Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa. Keep the rest of the room, including the pillows on the sofa and the lighting, unchanged." Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Base image prompt: "A wide shot of a modern, well-lit living room with a prominent blue sofa in the center. A coffee table is in front of it and a large window is in the background." living_room_image = Image . open ( '/path/to/your/living_room.png' ) text_input = """Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa. Keep the rest of the room, including the pillows on the sofa and the lighting, unchanged.""" # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ living_room_image , text_input ], ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'living_room_edited.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath = "/path/to/your/living_room.png" ; const imageData = fs . readFileSync ( imagePath ); const base64Image = imageData . toString ( "base64" ); const prompt = [ { inlineData : { mimeType : "image/png" , data : base64Image , }, }, { text : "Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa. Keep the rest of the room, including the pillows on the sofa and the lighting, unchanged." }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "living_room_edited.png" , buffer ); console . log ( "Image saved as living_room_edited.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imagePath := "/path/to/your/living_room.png" imgData , _ := os . ReadFile ( imagePath ) parts := [] * genai . Part { & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData , }, }, genai . NewPartFromText ( "Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa. Keep the rest of the room, including the pillows on the sofa and the lighting, unchanged." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "living_room_edited.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH = /path/to/your/living_room.png if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG_BASE64 \" } }, {\"text\": \"Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa. Keep the rest of the room, including the pillows on the sofa and the lighting, unchanged.\"} ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > living_room_edited.png Input Output A wide shot of a modern, well-lit living room... Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa... 3. Style transfer Provide an image and ask the model to recreate its content in a different artistic style. Template Transform the provided photograph of [subject] into the artistic style of [artist/art style]. Preserve the original composition but render it with [description of stylistic elements]. Prompt "Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows." Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Base image prompt: "A photorealistic, high-resolution photograph of a busy city street in New York at night, with bright neon signs, yellow taxis, and tall skyscrapers." city_image = Image . open ( '/path/to/your/city.png' ) text_input = """Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows.""" # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ city_image , text_input ], ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'city_style_transfer.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath = "/path/to/your/city.png" ; const imageData = fs . readFileSync ( imagePath ); const base64Image = imageData . toString ( "base64" ); const prompt = [ { inlineData : { mimeType : "image/png" , data : base64Image , }, }, { text : "Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows." }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "city_style_transfer.png" , buffer ); console . log ( "Image saved as city_style_transfer.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imagePath := "/path/to/your/city.png" imgData , _ := os . ReadFile ( imagePath ) parts := [] * genai . Part { & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData , }, }, genai . NewPartFromText ( "Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "city_style_transfer.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH = /path/to/your/city.png if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG_BASE64 \" } }, {\"text\": \"Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows.\"} ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > city_style_transfer.png Input Output A photorealistic, high-resolution photograph of a busy city street... Transform the provided photograph of a modern city street at night... 4. Advanced composition: Combining multiple images Provide multiple images as context to create a new, composite scene. This is perfect for product mockups or creative collages. Template Create a new image by combining the elements from the provided images. Take the [element from image 1] and place it with/on the [element from image 2]. The final image should be a [description of the final scene]. Prompt "Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment." Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Base image prompts: # 1. Dress: "A professionally shot photo of a blue floral summer dress on a plain white background, ghost mannequin style." # 2. Model: "Full-body shot of a woman with her hair in a bun, smiling, standing against a neutral grey studio background." dress_image = Image . open ( '/path/to/your/dress.png' ) model_image = Image . open ( '/path/to/your/model.png' ) text_input = """Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment.""" # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ dress_image , model_image , text_input ], ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'fashion_ecommerce_shot.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath1 = "/path/to/your/dress.png" ; const imageData1 = fs . readFileSync ( imagePath1 ); const base64Image1 = imageData1 . toString ( "base64" ); const imagePath2 = "/path/to/your/model.png" ; const imageData2 = fs . readFileSync ( imagePath2 ); const base64Image2 = imageData2 . toString ( "base64" ); const prompt = [ { inlineData : { mimeType : "image/png" , data : base64Image1 , }, }, { inlineData : { mimeType : "image/png" , data : base64Image2 , }, }, { text : "Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment." }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "fashion_ecommerce_shot.png" , buffer ); console . log ( "Image saved as fashion_ecommerce_shot.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imgData1 , _ := os . ReadFile ( "/path/to/your/dress.png" ) imgData2 , _ := os . ReadFile ( "/path/to/your/model.png" ) parts := [] * genai . Part { & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData1 , }, }, & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData2 , }, }, genai . NewPartFromText ( "Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "fashion_ecommerce_shot.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH1 = /path/to/your/dress.png IMG_PATH2 = /path/to/your/model.png if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG1_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH1 " 2>&1 ) IMG2_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH2 " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG1_BASE64 \" } }, { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG2_BASE64 \" } }, {\"text\": \"Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment.\"} ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > fashion_ecommerce_shot.png Input 1 Input 2 Output A professionally shot photo of a blue floral summer dress... Full-body shot of a woman with her hair in a bun... Create a professional e-commerce fashion photo... 5. High-fidelity detail preservation To ensure critical details (like a face or logo) are preserved during an edit, describe them in great detail along with your edit request. Template Using the provided images, place [element from image 2] onto [element from image 1]. Ensure that the features of [element from image 1] remain completely unchanged. The added element should [description of how the element should integrate]. Prompt "Take the first image of the woman with brown hair, blue eyes, and a neutral expression. Add the logo from the second image onto her black t-shirt. Ensure the woman's face and features remain completely unchanged. The logo should look like it's naturally printed on the fabric, following the folds of the shirt." Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Base image prompts: # 1. Woman: "A professional headshot of a woman with brown hair and blue eyes, wearing a plain black t-shirt, against a neutral studio background." # 2. Logo: "A simple, modern logo with the letters 'G' and 'A' in a white circle." woman_image = Image . open ( '/path/to/your/woman.png' ) logo_image = Image . open ( '/path/to/your/logo.png' ) text_input = """Take the first image of the woman with brown hair, blue eyes, and a neutral expression. Add the logo from the second image onto her black t-shirt. Ensure the woman's face and features remain completely unchanged. The logo should look like it's naturally printed on the fabric, following the folds of the shirt.""" # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ woman_image , logo_image , text_input ], ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'woman_with_logo.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath1 = "/path/to/your/woman.png" ; const imageData1 = fs . readFileSync ( imagePath1 ); const base64Image1 = imageData1 . toString ( "base64" ); const imagePath2 = "/path/to/your/logo.png" ; const imageData2 = fs . readFileSync ( imagePath2 ); const base64Image2 = imageData2 . toString ( "base64" ); const prompt = [ { inlineData : { mimeType : "image/png" , data : base64Image1 , }, }, { inlineData : { mimeType : "image/png" , data : base64Image2 , }, }, { text : "Take the first image of the woman with brown hair, blue eyes, and a neutral expression. Add the logo from the second image onto her black t-shirt. Ensure the woman's face and features remain completely unchanged. The logo should look like it's naturally printed on the fabric, following the folds of the shirt." }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "woman_with_logo.png" , buffer ); console . log ( "Image saved as woman_with_logo.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imgData1 , _ := os . ReadFile ( "/path/to/your/woman.png" ) imgData2 , _ := os . ReadFile ( "/path/to/your/logo.png" ) parts := [] * genai . Part { & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData1 , }, }, & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData2 , }, }, genai . NewPartFromText ( "Take the first image of the woman with brown hair, blue eyes, and a neutral expression. Add the logo from the second image onto her black t-shirt. Ensure the woman's face and features remain completely unchanged. The logo should look like it's naturally printed on the fabric, following the folds of the shirt." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "woman_with_logo.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH1 = /path/to/your/woman.png IMG_PATH2 = /path/to/your/logo.png if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG1_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH1 " 2>&1 ) IMG2_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH2 " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG1_BASE64 \" } }, { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG2_BASE64 \" } }, {\"text\": \"Take the first image of the woman with brown hair, blue eyes, and a neutral expression. Add the logo from the second image onto her black t-shirt. Ensure the woman's face and features remain completely unchanged. The logo should look like it's naturally printed on the fabric, following the folds of the shirt.\"} ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > woman_with_logo.png Input 1 Input 2 Output A professional headshot of a woman with brown hair and blue eyes... A simple, modern logo with the letters 'G' and 'A'... Take the first image of the woman with brown hair, blue eyes, and a neutral expression... Best Practices To elevate your results from good to great, incorporate these professional strategies into your workflow. Be Hyper-Specific: The more detail you provide, the more control you have. Instead of "fantasy armor," describe it: "ornate elven plate armor, etched with silver leaf patterns, with a high collar and pauldrons shaped like falcon wings." Provide Context and Intent: Explain the purpose of the image. The model's understanding of context will influence the final output. For example, "Create a logo for a high-end, minimalist skincare brand" will yield better results than just "Create a logo." Iterate and Refine: Don't expect a perfect image on the first try. Use the conversational nature of the model to make small changes. Follow up with prompts like, "That's great, but can you make the lighting a bit warmer?" or "Keep everything the same, but change the character's expression to be more serious." Use Step-by-Step Instructions: For complex scenes with many elements, break your prompt into steps. "First, create a background of a serene, misty forest at dawn. Then, in the foreground, add a moss-covered ancient stone altar. Finally, place a single, glowing sword on top of the altar." Use "Semantic Negative Prompts": Instead of saying "no cars," describe the desired scene positively: "an empty, deserted street with no signs of traffic." Control the Camera: Use photographic and cinematic language to control the composition. Terms like wide-angle shot , macro shot , low-angle perspective . Limitations For best performance, use the following languages: EN, es-MX, ja-JP, zh-CN, hi-IN. Image generation does not support audio or video inputs. The model won't always follow the exact number of image outputs that the user explicitly asked for. The model works best with up to 3 images as an input. When generating text for an image, Gemini works best if you first generate the text and then ask for an image with the text. Uploading images of children is not currently supported in EEA, CH, and UK. All generated images include a SynthID watermark . When to use Imagen In addition to using Gemini's built-in image generation capabilities, you can also access Imagen , our specialized image generation model, through the Gemini API. Attribute Imagen Gemini Native Image Strengths Most capable image generation model to date. Recommended for photorealistic images, sharper clarity, improved spelling and typography. Default recommendation. Unparalleled flexibility, contextual understanding, and simple, mask-free editing. Uniquely capable of multi-turn conversational editing. Availability Generally available Preview (Production usage allowed) Latency Low . Optimized for near-real-time performance. Higher. More computation is required for its advanced capabilities. Cost Cost-effective for specialized tasks. $0.02/image to $0.12/image Token-based pricing. $30 per 1 million tokens for image output (image output tokenized at 1290 tokens per image flat, up to 1024x1024px) Recommended tasks Image quality, photorealism, artistic detail, or specific styles (e.g., impressionism, anime) are top priorities. Infusing branding, style, or generating logos and product designs. Generating advanced spelling or typography. Interleaved text and image generation to seamlessly blend text and images. Combine creative elements from multiple images with a single prompt. Make highly specific edits to images, modify individual elements with simple language commands, and iteratively work on an image. Apply a specific design or texture from one image to another while preserving the original subject's form and details. Imagen 4 should be your go-to model starting to generate images with Imagen. Choose Imagen 4 Ultra for advanced use-cases or when you need the best image quality (note that can only generate one image at a time). What's next Find more examples and code samples in the cookbook guide . Check out the Veo guide to learn how to generate videos with the Gemini API. To learn more about Gemini models, see Gemini models . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-09-08 UTC.
text_content/docs_image-generation_56ff9b7f.txt CHANGED
@@ -1,5 +1,5 @@
1
  URL: https://ai.google.dev/gemini-api/docs/image-generation#imagen
2
- Title: Image generation  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Image generation | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Image generation You can generate images using the Gemini API with either Gemini's built-in multimodal capabilities or Imagen, Google's specialized image generation models. For most use cases, start with Gemini . Choose Imagen for specialized tasks where image quality is critical. See Choosing the right model section for more guidance. All generated images include a SynthID watermark . Before you begin Ensure you use a supported model and version for image generation: For Gemini , use Gemini 2.0 Flash Preview Image Generation. For Imagen , use one of the Imagen models (Imagen 3, Imagen 4 or Imagen 4 Ultra). Note that those models are only available on the Paid tier . You can access both Gemini and Imagen models using the same libraries. Note: Image generation may not be available in all regions and countries, review our Models page for more information. Generate images using Gemini Gemini can generate and process images conversationally. You can prompt Gemini with text, images, or a combination of both to achieve various image-related tasks, such as image generation and editing. You must include responseModalities : ["TEXT", "IMAGE"] in your configuration. Image-only output is not supported with these models. Image generation (text-to-image) The following code demonstrates how to generate an image based on a descriptive prompt: Python from google import genai from google.genai import types from PIL import Image from io import BytesIO import base64 client = genai . Client () contents = ( 'Hi, can you create a 3d rendered image of a pig ' 'with wings and a top hat flying over a happy ' 'futuristic scifi city with lots of greenery?' ) response = client . models . generate_content ( model = "gemini-2.0-flash-preview-image-generation" , contents = contents , config = types . GenerateContentConfig ( response_modalities = [ 'TEXT' , 'IMAGE' ] ) ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) elif part . inline_data is not None : image = Image . open ( BytesIO (( part . inline_data . data ))) image . save ( 'gemini-native-image.png' ) image . show () JavaScript Note: We've released the Google SDK for TypeScript and JavaScript in preview launch stage . Use this SDK for image generation features. import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const contents = "Hi, can you create a 3d rendered image of a pig " + "with wings and a top hat flying over a happy " + "futuristic scifi city with lots of greenery?" ; // Set responseModalities to include "Image" so the model can generate an image const response = await ai . models . generateContent ({ model : "gemini-2.0-flash-preview-image-generation" , contents : contents , config : { responseModalities : [ Modality . TEXT , Modality . IMAGE ], }, }); for ( const part of response . candidates [ 0 ]. content . parts ) { // Based on the part type, either show the text or save the image if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "gemini-native-image.png" , buffer ); console . log ( "Image saved as gemini-native-image.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } config := & genai . GenerateContentConfig { ResponseModalities : [] string { "TEXT" , "IMAGE" }, } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.0-flash-preview-image-generation" , genai . Text ( "Hi, can you create a 3d rendered image of a pig " + "with wings and a top hat flying over a happy " + "futuristic scifi city with lots of greenery?" ), config , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "gemini_generated_image.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "Hi, can you create a 3d rendered image of a pig with wings and a top hat flying over a happy futuristic scifi city with lots of greenery?"} ] }], "generationConfig":{"responseModalities":["TEXT","IMAGE"]} }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > gemini-native-image.png AI-generated image of a fantastical flying pig Image editing (text-and-image-to-image) To perform image editing, add an image as input. The following example demonstrates uploading base64 encoded images. For multiple images and larger payloads, check the image input section. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO import PIL.Image image = PIL . Image . open ( '/path/to/image.png' ) client = genai . Client () text_input = ( 'Hi, This is a picture of me.' 'Can you add a llama next to me?' ,) response = client . models . generate_content ( model = "gemini-2.0-flash-preview-image-generation" , contents = [ text_input , image ], config = types . GenerateContentConfig ( response_modalities = [ 'TEXT' , 'IMAGE' ] ) ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) elif part . inline_data is not None : image = Image . open ( BytesIO (( part . inline_data . data ))) image . show () JavaScript Note: We've released the Google SDK for TypeScript and JavaScript in preview launch stage . Use this SDK for image generation features. import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); // Load the image from the local file system const imagePath = "path/to/image.png" ; const imageData = fs . readFileSync ( imagePath ); const base64Image = imageData . toString ( "base64" ); // Prepare the content parts const contents = [ { text : "Can you add a llama next to the image?" }, { inlineData : { mimeType : "image/png" , data : base64Image , }, }, ]; // Set responseModalities to include "Image" so the model can generate an image const response = await ai . models . generateContent ({ model : "gemini-2.0-flash-preview-image-generation" , contents : contents , config : { responseModalities : [ Modality . TEXT , Modality . IMAGE ], }, }); for ( const part of response . candidates [ 0 ]. content . parts ) { // Based on the part type, either show the text or save the image if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "gemini-native-image.png" , buffer ); console . log ( "Image saved as gemini-native-image.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imagePath := "/path/to/image.png" imgData , _ := os . ReadFile ( imagePath ) parts := [] * genai . Part { genai . NewPartFromText ( "Hi, This is a picture of me. Can you add a llama next to me?" ), & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData , }, }, } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } config := & genai . GenerateContentConfig { ResponseModalities : [] string { "TEXT" , "IMAGE" }, } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.0-flash-preview-image-generation" , contents , config , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "gemini_generated_image.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH = /path/to/your/image1.jpeg if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-preview-image-generation:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ {\"text\": \"'Hi, This is a picture of me. Can you add a llama next to me\"}, { \"inline_data\": { \"mime_type\":\"image/jpeg\", \"data\": \" $IMG_BASE64 \" } } ] }], \"generationConfig\": {\"responseModalities\": [\"TEXT\", \"IMAGE\"]} }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > gemini-edited-image.png Other image generation modes Gemini supports other image interaction modes based on prompt structure and context, including: Text to image(s) and text (interleaved): Outputs images with related text. Example prompt: "Generate an illustrated recipe for a paella." Image(s) and text to image(s) and text (interleaved) : Uses input images and text to create new related images and text. Example prompt: (With an image of a furnished room) "What other color sofas would work in my space? can you update the image?" Multi-turn image editing (chat): Keep generating / editing images conversationally. Example prompts: [upload an image of a blue car.] , "Turn this car into a convertible.", "Now change the color to yellow." Limitations For best performance, use the following languages: EN, es-MX, ja-JP, zh-CN, hi-IN. Image generation does not support audio or video inputs. Image generation may not always trigger: The model may output text only. Try asking for image outputs explicitly (e.g. "generate an image", "provide images as you go along", "update the image"). The model may stop generating partway through. Try again or try a different prompt. When generating text for an image, Gemini works best if you first generate the text and then ask for an image with the text. There are some regions/countries where Image generation is not available. See Models for more information. Generate images using the Imagen models This example demonstrates generating images with an Imagen model : Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () response = client . models . generate_images ( model = 'imagen-4.0-generate-preview-06-06' , prompt = 'Robot holding a red skateboard' , config = types . GenerateImagesConfig ( number_of_images = 4 , ) ) for generated_image in response . generated_images : generated_image . image . show () JavaScript import { GoogleGenAI } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const response = await ai . models . generateImages ({ model : 'imagen-4.0-generate-preview-06-06' , prompt : 'Robot holding a red skateboard' , config : { numberOfImages : 4 , }, }); let idx = 1 ; for ( const generatedImage of response . generatedImages ) { let imgBytes = generatedImage . image . imageBytes ; const buffer = Buffer . from ( imgBytes , "base64" ); fs . writeFileSync ( `imagen- ${ idx } .png` , buffer ); idx ++ ; } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } config := & genai . GenerateImagesConfig { NumberOfImages : 4 , } response , _ := client . Models . GenerateImages ( ctx , "imagen-4.0-generate-preview-06-06" , "Robot holding a red skateboard" , config , ) for n , image := range response . GeneratedImages { fname := fmt . Sprintf ( "imagen-%d.png" , n ) _ = os . WriteFile ( fname , image . Image . ImageBytes , 0644 ) } } REST curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/imagen-4.0-generate-preview-06-06:predict" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "instances": [ { "prompt": "Robot holding a red skateboard" } ], "parameters": { "sampleCount": 4 } }' AI-generated image of a robot holding a red skateboard Imagen configuration Imagen supports English only prompts at this time and the following parameters: Note: Naming conventions of parameters vary by programming language. numberOfImages : The number of images to generate, from 1 to 4 (inclusive). The default is 4. For Imagen 4 Ultra, it defaults to 1 as only one image can be generated at a time. aspectRatio : Changes the aspect ratio of the generated image. Supported values are "1:1" , "3:4" , "4:3" , "9:16" , and "16:9" . The default is "1:1" . personGeneration : Allow the model to generate images of people. The following values are supported: "dont_allow" : Block generation of images of people. "allow_adult" : Generate images of adults, but not children. This is the default. "allow_all" : Generate images that include adults and children. Note: The "allow_all" parameter value is not allowed in EU, UK, CH, MENA locations. Choosing the right model Choose Gemini when: You need contextually relevant images that leverage world knowledge and reasoning. Seamlessly blending text and images is important. You want accurate visuals embedded within long text sequences. You want to edit images conversationally while maintaining context. Choose Imagen when: Image quality, photorealism, artistic detail, or specific styles (e.g., impressionism, anime) are top priorities. Performing specialized editing tasks like product background updates or image upscaling. Infusing branding, style, or generating logos and product designs. Imagen 4 should be your go-to model starting to generate images with Imagen. Choose Imagen 4 Ultra for advanced use-cases or when you need the best image quality. Note that Imagen 4 Ultra can only generate one image at a time. Imagen prompt guide This section of the Imagen guide shows you how modifying a text-to-image prompt can produce different results, along with examples of images you can create. Prompt writing basics Note: Maximum prompt length is 480 tokens. A good prompt is descriptive and clear, and makes use of meaningful keywords and modifiers. Start by thinking of your subject , context , and style . Image text: A sketch ( style ) of a modern apartment building ( subject ) surrounded by skyscrapers ( context and background ). Subject : The first thing to think about with any prompt is the subject : the object, person, animal, or scenery you want an image of. Context and background: Just as important is the background or context in which the subject will be placed. Try placing your subject in a variety of backgrounds. For example, a studio with a white background, outdoors, or indoor environments. Style: Finally, add the style of image you want. Styles can be general (painting, photograph, sketches) or very specific (pastel painting, charcoal drawing, isometric 3D). You can also combine styles. After you write a first version of your prompt, refine your prompt by adding more details until you get to the image that you want. Iteration is important. Start by establishing your core idea, and then refine and expand upon that core idea until the generated image is close to your vision. Prompt: A park in the spring next to a lake Prompt: A park in the spring next to a lake, the sun sets across the lake, golden hour Prompt: A park in the spring next to a lake, the sun sets across the lake, golden hour, red wildflowers Imagen models can transform your ideas into detailed images, whether your prompts are short or long and detailed. Refine your vision through iterative prompting, adding details until you achieve the perfect result. Short prompts let you generate an image quickly. Prompt: close-up photo of a woman in her 20s, street photography, movie still, muted orange warm tones Longer prompts let you add specific details and build your image. Prompt: captivating photo of a woman in her 20s utilizing a street photography style. The image should look like a movie still with muted orange warm tones. Additional advice for Imagen prompt writing: Use descriptive language : Employ detailed adjectives and adverbs to paint a clear picture for Imagen. Provide context : If necessary, include background information to aid the AI's understanding. Reference specific artists or styles : If you have a particular aesthetic in mind, referencing specific artists or art movements can be helpful. Use prompt engineering tools : Consider exploring prompt engineering tools or resources to help you refine your prompts and achieve optimal results. Enhancing the facial details in your personal and group images : Specify facial details as a focus of the photo (for example, use the word "portrait" in the prompt). Generate text in images Imagen models can add text into images, opening up more creative image generation possibilities. Use the following guidance to get the most out of this feature: Iterate with confidence : You might have to regenerate images until you achieve the look you want. Imagen's text integration is still evolving, and sometimes multiple attempts yield the best results. Keep it short : Limit text to 25 characters or less for optimal generation. Multiple phrases : Experiment with two or three distinct phrases to provide additional information. Avoid exceeding three phrases for cleaner compositions. Prompt: A poster with the text "Summerland" in bold font as a title, underneath this text is the slogan "Summer never felt so good" Guide Placement : While Imagen can attempt to position text as directed, expect occasional variations. This feature is continually improving. Inspire font style : Specify a general font style to subtly influence Imagen's choices. Don't rely on precise font replication, but expect creative interpretations. Font size : Specify a font size or a general indication of size (for example, small , medium , large ) to influence the font size generation. Prompt parameterization To better control output results, you might find it helpful to parameterize the inputs into Imagen. For example, suppose you want your customers to be able to generate logos for their business, and you want to make sure logos are always generated on a solid color background. You also want to limit the options that the client can select from a menu. In this example, you can create a parameterized prompt similar to the following: A {logo_style} logo for a {company_area} company on a solid color background. Include the text {company_name} . In your custom user interface, the customer can input the parameters using a menu, and their chosen value populates the prompt Imagen receives. For example: Prompt: A minimalist logo for a health care company on a solid color background. Include the text Journey . Prompt: A modern logo for a software company on a solid color background. Include the text Silo . Prompt: A traditional logo for a baking company on a solid color background. Include the text Seed . Advanced prompt writing techniques Use the following examples to create more specific prompts based on attributes like photography descriptors, shapes and materials, historical art movements, and image quality modifiers. Photography Prompt includes: "A photo of..." To use this style, start with using keywords that clearly tell Imagen that you're looking for a photograph. Start your prompts with "A photo of. . ." . For example: Prompt: A photo of coffee beans in a kitchen on a wooden surface Prompt: A photo of a chocolate bar on a kitchen counter Prompt: A photo of a modern building with water in the background Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Photography modifiers In the following examples, you can see several photography-specific modifiers and parameters. You can combine multiple modifiers for more precise control. Camera Proximity - Close up, taken from far away Prompt: A close-up photo of coffee beans Prompt: A zoomed out photo of a small bag of coffee beans in a messy kitchen Camera Position - aerial, from below Prompt: aerial photo of urban city with skyscrapers Prompt: A photo of a forest canopy with blue skies from below Lighting - natural, dramatic, warm, cold Prompt: studio photo of a modern arm chair, natural lighting Prompt: studio photo of a modern arm chair, dramatic lighting Camera Settings - motion blur, soft focus, bokeh, portrait Prompt: photo of a city with skyscrapers from the inside of a car with motion blur Prompt: soft focus photograph of a bridge in an urban city at night Lens types - 35mm, 50mm, fisheye, wide angle, macro Prompt: photo of a leaf, macro lens Prompt: street photography, new york city, fisheye lens Film types - black and white, polaroid Prompt: a polaroid portrait of a dog wearing sunglasses Prompt: black and white photo of a dog wearing sunglasses Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Illustration and art Prompt includes: "A painting of..." , "A sketch of..." Art styles vary from monochrome styles like pencil sketches, to hyper-realistic digital art. For example, the following images use the same prompt with different styles: "An [art style or creation technique] of an angular sporty electric sedan with skyscrapers in the background" Prompt: A technical pencil drawing of an angular... Prompt: A charcoal drawing of an angular... Prompt: A color pencil drawing of an angular... Prompt: A pastel painting of an angular... Prompt: A digital art of an angular... Prompt: An art deco (poster) of an angular... Image source: Each image was generated using its corresponding text prompt with the Imagen 2 model. Shapes and materials Prompt includes: "...made of..." , "...in the shape of..." One of the strengths of this technology is that you can create imagery that is otherwise difficult or impossible. For example, you can recreate your company logo in different materials and textures. Prompt: a duffle bag made of cheese Prompt: neon tubes in the shape of a bird Prompt: an armchair made of paper , studio photo, origami style Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Historical art references Prompt includes: "...in the style of..." Certain styles have become iconic over the years. The following are some ideas of historical painting or art styles that you can try. "generate an image in the style of [art period or movement] : a wind farm" Prompt: generate an image in the style of an impressionist painting : a wind farm Prompt: generate an image in the style of a renaissance painting : a wind farm Prompt: generate an image in the style of pop art : a wind farm Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Image quality modifiers Certain keywords can let the model know that you're looking for a high-quality asset. Examples of quality modifiers include the following: General Modifiers - high-quality, beautiful, stylized Photos - 4K, HDR, Studio Photo Art, Illustration - by a professional, detailed The following are a few examples of prompts without quality modifiers and the same prompt with quality modifiers. Prompt (no quality modifiers): a photo of a corn stalk Prompt (with quality modifiers): 4k HDR beautiful photo of a corn stalk taken by a professional photographer Image source: Each image was generated using its corresponding text prompt with the Imagen 3 model. Aspect ratios Imagen image generation lets you set five distinct image aspect ratios. Square (1:1, default) - A standard square photo. Common uses for this aspect ratio include social media posts. Fullscreen (4:3) - This aspect ratio is commonly used in media or film. It is also the dimensions of most old (non-widescreen) TVs and medium format cameras. It captures more of the scene horizontally (compared to 1:1), making it a preferred aspect ratio for photography. Prompt: close up of a musician's fingers playing the piano, black and white film, vintage (4:3 aspect ratio) Prompt: A professional studio photo of french fries for a high end restaurant, in the style of a food magazine (4:3 aspect ratio) Portrait full screen (3:4) - This is the fullscreen aspect ratio rotated 90 degrees. This lets to capture more of the scene vertically compared to the 1:1 aspect ratio. Prompt: a woman hiking, close of her boots reflected in a puddle, large mountains in the background, in the style of an advertisement, dramatic angles (3:4 aspect ratio) Prompt: aerial shot of a river flowing up a mystical valley (3:4 aspect ratio) Widescreen (16:9) - This ratio has replaced 4:3 and is now the most common aspect ratio for TVs, monitors, and mobile phone screens (landscape). Use this aspect ratio when you want to capture more of the background (for example, scenic landscapes). Prompt: a man wearing all white clothing sitting on the beach, close up, golden hour lighting (16:9 aspect ratio) Portrait (9:16) - This ratio is widescreen but rotated. This a relatively new aspect ratio that has been popularized by short form video apps (for example, YouTube shorts). Use this for tall objects with strong vertical orientations such as buildings, trees, waterfalls, or other similar objects. Prompt: a digital render of a massive skyscraper, modern, grand, epic with a beautiful sunset in the background (9:16 aspect ratio) Photorealistic images Different versions of the image generation model might offer a mix of artistic and photorealistic output. Use the following wording in prompts to generate more photorealistic output, based on the subject you want to generate. Note: Take these keywords as general guidance when you try to create photorealistic images. They aren't required to achieve your goal. Use case Lens type Focal lengths Additional details People (portraits) Prime, zoom 24-35mm black and white film, Film noir, Depth of field, duotone (mention two colors) Food, insects, plants (objects, still life) Macro 60-105mm High detail, precise focusing, controlled lighting Sports, wildlife (motion) Telephoto zoom 100-400mm Fast shutter speed, Action or movement tracking Astronomical, landscape (wide-angle) Wide-angle 10-24mm Long exposure times, sharp focus, long exposure, smooth water or clouds Portraits Use case Lens type Focal lengths Additional details People (portraits) Prime, zoom 24-35mm black and white film, Film noir, Depth of field, duotone (mention two colors) Using several keywords from the table, Imagen can generate the following portraits: Prompt: A woman, 35mm portrait, blue and grey duotones Model: imagen-3.0-generate-002 Prompt: A woman, 35mm portrait, film noir Model: imagen-3.0-generate-002 Objects Use case Lens type Focal lengths Additional details Food, insects, plants (objects, still life) Macro 60-105mm High detail, precise focusing, controlled lighting Using several keywords from the table, Imagen can generate the following object images: Prompt: leaf of a prayer plant, macro lens, 60mm Model: imagen-3.0-generate-002 Prompt: a plate of pasta, 100mm Macro lens Model: imagen-3.0-generate-002 Motion Use case Lens type Focal lengths Additional details Sports, wildlife (motion) Telephoto zoom 100-400mm Fast shutter speed, Action or movement tracking Using several keywords from the table, Imagen can generate the following motion images: Prompt: a winning touchdown, fast shutter speed, movement tracking Model: imagen-3.0-generate-002 Prompt: A deer running in the forest, fast shutter speed, movement tracking Model: imagen-3.0-generate-002 Wide-angle Use case Lens type Focal lengths Additional details Astronomical, landscape (wide-angle) Wide-angle 10-24mm Long exposure times, sharp focus, long exposure, smooth water or clouds Using several keywords from the table, Imagen can generate the following wide-angle images: Prompt: an expansive mountain range, landscape wide angle 10mm Model: imagen-3.0-generate-002 Prompt: a photo of the moon, astro photography, wide angle 10mm Model: imagen-3.0-generate-002 What's next Check out the Veo guide to learn how to generate videos with the Gemini API. To learn more about Gemini models, see Gemini models and Experimental models . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-27 UTC.
 
1
  URL: https://ai.google.dev/gemini-api/docs/image-generation#imagen
2
+ Title: Image generation with Gemini (aka Nano Banana)  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Image generation with Gemini (aka Nano Banana) | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Image generation with Gemini (aka Nano Banana) Gemini can generate and process images conversationally. You can prompt Gemini with text, images, or a combination of both allowing you to create, edit, and iterate on visuals with unprecedented control: Text-to-Image: Generate high-quality images from simple or complex text descriptions. Image + Text-to-Image (Editing): Provide an image and use text prompts to add, remove, or modify elements, change the style, or adjust the color grading. Multi-Image to Image (Composition & Style Transfer): Use multiple input images to compose a new scene or transfer the style from one image to another. Iterative Refinement: Engage in a conversation to progressively refine your image over multiple turns, making small adjustments until it's perfect. High-Fidelity Text Rendering: Accurately generate images that contain legible and well-placed text, ideal for logos, diagrams, and posters. All generated images include a SynthID watermark . Image generation (text-to-image) The following code demonstrates how to generate an image based on a descriptive prompt. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () prompt = ( "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme" ) response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ prompt ], ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) elif part . inline_data is not None : image = Image . open ( BytesIO ( part . inline_data . data )) image . save ( "generated_image.png" ) JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme" ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "gemini-native-image.png" , buffer ); console . log ( "Image saved as gemini-native-image.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "Create a picture of a nano banana dish in a " + " fancy restaurant with a Gemini theme" ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "gemini_generated_image.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme"} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > gemini-native-image.png AI-generated image of a nano banana dish in a Gemini-themed restaurant Image editing (text-and-image-to-image) Reminder : Make sure you have the necessary rights to any images you upload. Don't generate content that infringe on others' rights, including videos or images that deceive, harass, or harm. Your use of this generative AI service is subject to our Prohibited Use Policy . The following example demonstrates uploading base64 encoded images. For multiple images, larger payloads, and supported MIME types, check the Image understanding page. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () prompt = ( "Create a picture of my cat eating a nano-banana in a " "fancy restaurant under the Gemini constellation" , ) image = Image . open ( "/path/to/cat_image.png" ) response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ prompt , image ], ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) elif part . inline_data is not None : image = Image . open ( BytesIO ( part . inline_data . data )) image . save ( "generated_image.png" ) JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath = "path/to/cat_image.png" ; const imageData = fs . readFileSync ( imagePath ); const base64Image = imageData . toString ( "base64" ); const prompt = [ { text : "Create a picture of my cat eating a nano-banana in a" + "fancy restaurant under the Gemini constellation" }, { inlineData : { mimeType : "image/png" , data : base64Image , }, }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "gemini-native-image.png" , buffer ); console . log ( "Image saved as gemini-native-image.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imagePath := "/path/to/cat_image.png" imgData , _ := os . ReadFile ( imagePath ) parts := [] * genai . Part { genai . NewPartFromText ( "Create a picture of my cat eating a nano-banana in a fancy restaurant under the Gemini constellation" ), & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData , }, }, } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "gemini_generated_image.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH = /path/to/cat_image.jpeg if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ {\"text\": \"'Create a picture of my cat eating a nano-banana in a fancy restaurant under the Gemini constellation\"}, { \"inline_data\": { \"mime_type\":\"image/jpeg\", \"data\": \" $IMG_BASE64 \" } } ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > gemini-edited-image.png AI-generated image of a cat eating a nano banana Other image generation modes Gemini supports other image interaction modes based on prompt structure and context, including: Text to image(s) and text (interleaved): Outputs images with related text. Example prompt: "Generate an illustrated recipe for a paella." Image(s) and text to image(s) and text (interleaved) : Uses input images and text to create new related images and text. Example prompt: (With an image of a furnished room) "What other color sofas would work in my space? can you update the image?" Multi-turn image editing (chat): Keep generating and editing images conversationally. Example prompts: [upload an image of a blue car.] , "Turn this car into a convertible.", "Now change the color to yellow." Prompting guide and strategies Mastering Gemini 2.5 Flash Image Generation starts with one fundamental principle: Describe the scene, don't just list keywords. The model's core strength is its deep language understanding. A narrative, descriptive paragraph will almost always produce a better, more coherent image than a list of disconnected words. Prompts for generating images The following strategies will help you create effective prompts to generate exactly the images you're looking for. 1. Photorealistic scenes For realistic images, use photography terms. Mention camera angles, lens types, lighting, and fine details to guide the model toward a photorealistic result. Template A photorealistic [shot type] of [subject], [action or expression], set in [environment]. The scene is illuminated by [lighting description], creating a [mood] atmosphere. Captured with a [camera/lens details], emphasizing [key textures and details]. The image should be in a [aspect ratio] format. Prompt A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful. Vertical portrait orientation. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop with pottery wheels and shelves of clay pots in the background. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay and the fabric of his apron. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'photorealistic_example.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop with pottery wheels and shelves of clay pots in the background. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay and the fabric of his apron. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "photorealistic_example.png" , buffer ); console . log ( "Image saved as photorealistic_example.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop with pottery wheels and shelves of clay pots in the background. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay and the fabric of his apron. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "photorealistic_example.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop with pottery wheels and shelves of clay pots in the background. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay and the fabric of his apron. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > photorealistic_example.png A photorealistic close-up portrait of an elderly Japanese ceramicist... 2. Stylized illustrations & stickers To create stickers, icons, or assets, be explicit about the style and request a transparent background. Template A [style] sticker of a [subject], featuring [key characteristics] and a [color palette]. The design should have [line style] and [shading style]. The background must be transparent. Prompt A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It's munching on a green bamboo leaf. The design features bold, clean outlines, simple cel-shading, and a vibrant color palette. The background must be white. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It's munching on a green bamboo leaf. The design features bold, clean outlines, simple cel-shading, and a vibrant color palette. The background must be white." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'red_panda_sticker.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It's munching on a green bamboo leaf. The design features bold, clean outlines, simple cel-shading, and a vibrant color palette. The background must be white." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "red_panda_sticker.png" , buffer ); console . log ( "Image saved as red_panda_sticker.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It's munching on a green bamboo leaf. The design features bold, clean outlines, simple cel-shading, and a vibrant color palette. The background must be white." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "red_panda_sticker.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It' "'" 's munching on a green bamboo leaf. The design features bold, clean outlines, simple cel-shading, and a vibrant color palette. The background must be white."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > red_panda_sticker.png A kawaii-style sticker of a happy red panda... 3. Accurate text in images Gemini excels at rendering text. Be clear about the text, the font style (descriptively), and the overall design. Template Create a [image type] for [brand/concept] with the text "[text to render]" in a [font style]. The design should be [style description], with a [color scheme]. Prompt Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a a coffee bean seamlessly integrated with the text. The color scheme is black and white. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a a coffee bean seamlessly integrated with the text. The color scheme is black and white." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'logo_example.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a a coffee bean seamlessly integrated with the text. The color scheme is black and white." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "logo_example.png" , buffer ); console . log ( "Image saved as logo_example.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a a coffee bean seamlessly integrated with the text. The color scheme is black and white." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "logo_example.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "Create a modern, minimalist logo for a coffee shop called ' "'" 'The Daily Grind' "'" '. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a a coffee bean seamlessly integrated with the text. The color scheme is black and white."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > logo_example.png Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'... 4. Product mockups & commercial photography Perfect for creating clean, professional product shots for e-commerce, advertising, or branding. Template A high-resolution, studio-lit product photograph of a [product description] on a [background surface/description]. The lighting is a [lighting setup, e.g., three-point softbox setup] to [lighting purpose]. The camera angle is a [angle type] to showcase [specific feature]. Ultra-realistic, with sharp focus on [key detail]. [Aspect ratio]. Prompt A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black, presented on a polished concrete surface. The lighting is a three-point softbox setup designed to create soft, diffused highlights and eliminate harsh shadows. The camera angle is a slightly elevated 45-degree shot to showcase its clean lines. Ultra-realistic, with sharp focus on the steam rising from the coffee. Square image. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black, presented on a polished concrete surface. The lighting is a three-point softbox setup designed to create soft, diffused highlights and eliminate harsh shadows. The camera angle is a slightly elevated 45-degree shot to showcase its clean lines. Ultra-realistic, with sharp focus on the steam rising from the coffee. Square image." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'product_mockup.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black, presented on a polished concrete surface. The lighting is a three-point softbox setup designed to create soft, diffused highlights and eliminate harsh shadows. The camera angle is a slightly elevated 45-degree shot to showcase its clean lines. Ultra-realistic, with sharp focus on the steam rising from the coffee. Square image." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "product_mockup.png" , buffer ); console . log ( "Image saved as product_mockup.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black, presented on a polished concrete surface. The lighting is a three-point softbox setup designed to create soft, diffused highlights and eliminate harsh shadows. The camera angle is a slightly elevated 45-degree shot to showcase its clean lines. Ultra-realistic, with sharp focus on the steam rising from the coffee. Square image." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "product_mockup.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black, presented on a polished concrete surface. The lighting is a three-point softbox setup designed to create soft, diffused highlights and eliminate harsh shadows. The camera angle is a slightly elevated 45-degree shot to showcase its clean lines. Ultra-realistic, with sharp focus on the steam rising from the coffee. Square image."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > product_mockup.png A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug... 5. Minimalist & negative space design Excellent for creating backgrounds for websites, presentations, or marketing materials where text will be overlaid. Template A minimalist composition featuring a single [subject] positioned in the [bottom-right/top-left/etc.] of the frame. The background is a vast, empty [color] canvas, creating significant negative space. Soft, subtle lighting. [Aspect ratio]. Prompt A minimalist composition featuring a single, delicate red maple leaf positioned in the bottom-right of the frame. The background is a vast, empty off-white canvas, creating significant negative space for text. Soft, diffused lighting from the top left. Square image. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "A minimalist composition featuring a single, delicate red maple leaf positioned in the bottom-right of the frame. The background is a vast, empty off-white canvas, creating significant negative space for text. Soft, diffused lighting from the top left. Square image." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'minimalist_design.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "A minimalist composition featuring a single, delicate red maple leaf positioned in the bottom-right of the frame. The background is a vast, empty off-white canvas, creating significant negative space for text. Soft, diffused lighting from the top left. Square image." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "minimalist_design.png" , buffer ); console . log ( "Image saved as minimalist_design.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "A minimalist composition featuring a single, delicate red maple leaf positioned in the bottom-right of the frame. The background is a vast, empty off-white canvas, creating significant negative space for text. Soft, diffused lighting from the top left. Square image." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "minimalist_design.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "A minimalist composition featuring a single, delicate red maple leaf positioned in the bottom-right of the frame. The background is a vast, empty off-white canvas, creating significant negative space for text. Soft, diffused lighting from the top left. Square image."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > minimalist_design.png A minimalist composition featuring a single, delicate red maple leaf... 6. Sequential art (Comic panel / Storyboard) Builds on character consistency and scene description to create panels for visual storytelling. Template A single comic book panel in a [art style] style. In the foreground, [character description and action]. In the background, [setting details]. The panel has a [dialogue/caption box] with the text "[Text]". The lighting creates a [mood] mood. [Aspect ratio]. Prompt A single comic book panel in a gritty, noir art style with high-contrast black and white inks. In the foreground, a detective in a trench coat stands under a flickering streetlamp, rain soaking his shoulders. In the background, the neon sign of a desolate bar reflects in a puddle. A caption box at the top reads "The city was a tough place to keep secrets." The lighting is harsh, creating a dramatic, somber mood. Landscape. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "A single comic book panel in a gritty, noir art style with high-contrast black and white inks. In the foreground, a detective in a trench coat stands under a flickering streetlamp, rain soaking his shoulders. In the background, the neon sign of a desolate bar reflects in a puddle. A caption box at the top reads \" The city was a tough place to keep secrets. \" The lighting is harsh, creating a dramatic, somber mood. Landscape." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'comic_panel.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "A single comic book panel in a gritty, noir art style with high-contrast black and white inks. In the foreground, a detective in a trench coat stands under a flickering streetlamp, rain soaking his shoulders. In the background, the neon sign of a desolate bar reflects in a puddle. A caption box at the top reads \"The city was a tough place to keep secrets.\" The lighting is harsh, creating a dramatic, somber mood. Landscape." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "comic_panel.png" , buffer ); console . log ( "Image saved as comic_panel.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "A single comic book panel in a gritty, noir art style with high-contrast black and white inks. In the foreground, a detective in a trench coat stands under a flickering streetlamp, rain soaking his shoulders. In the background, the neon sign of a desolate bar reflects in a puddle. A caption box at the top reads \"The city was a tough place to keep secrets.\" The lighting is harsh, creating a dramatic, somber mood. Landscape." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "comic_panel.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "A single comic book panel in a gritty, noir art style with high-contrast black and white inks. In the foreground, a detective in a trench coat stands under a flickering streetlamp, rain soaking his shoulders. In the background, the neon sign of a desolate bar reflects in a puddle. A caption box at the top reads \"The city was a tough place to keep secrets.\" The lighting is harsh, creating a dramatic, somber mood. Landscape."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > comic_panel.png A single comic book panel in a gritty, noir art style... Prompts for editing images These examples show how to provide images alongside your text prompts for editing, composition, and style transfer. 1. Adding and removing elements Provide an image and describe your change. The model will match the original image's style, lighting, and perspective. Template Using the provided image of [subject], please [add/remove/modify] [element] to/from the scene. Ensure the change is [description of how the change should integrate]. Prompt "Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and matches the soft lighting of the photo." Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Base image prompt: "A photorealistic picture of a fluffy ginger cat sitting on a wooden floor, looking directly at the camera. Soft, natural light from a window." image_input = Image . open ( '/path/to/your/cat_photo.png' ) text_input = """Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and not falling off.""" # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ text_input , image_input ], ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'cat_with_hat.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath = "/path/to/your/cat_photo.png" ; const imageData = fs . readFileSync ( imagePath ); const base64Image = imageData . toString ( "base64" ); const prompt = [ { text : "Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and not falling off." }, { inlineData : { mimeType : "image/png" , data : base64Image , }, }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "cat_with_hat.png" , buffer ); console . log ( "Image saved as cat_with_hat.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imagePath := "/path/to/your/cat_photo.png" imgData , _ := os . ReadFile ( imagePath ) parts := [] * genai . Part { genai . NewPartFromText ( "Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and not falling off." ), & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData , }, }, } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "cat_with_hat.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH = /path/to/your/cat_photo.png if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ {\"text\": \"Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and not falling off.\"}, { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG_BASE64 \" } } ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > cat_with_hat.png Input Output A photorealistic picture of a fluffy ginger cat... Using the provided image of my cat, please add a small, knitted wizard hat... 2. Inpainting (Semantic masking) Conversationally define a "mask" to edit a specific part of an image while leaving the rest untouched. Template Using the provided image, change only the [specific element] to [new element/description]. Keep everything else in the image exactly the same, preserving the original style, lighting, and composition. Prompt "Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa. Keep the rest of the room, including the pillows on the sofa and the lighting, unchanged." Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Base image prompt: "A wide shot of a modern, well-lit living room with a prominent blue sofa in the center. A coffee table is in front of it and a large window is in the background." living_room_image = Image . open ( '/path/to/your/living_room.png' ) text_input = """Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa. Keep the rest of the room, including the pillows on the sofa and the lighting, unchanged.""" # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ living_room_image , text_input ], ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'living_room_edited.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath = "/path/to/your/living_room.png" ; const imageData = fs . readFileSync ( imagePath ); const base64Image = imageData . toString ( "base64" ); const prompt = [ { inlineData : { mimeType : "image/png" , data : base64Image , }, }, { text : "Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa. Keep the rest of the room, including the pillows on the sofa and the lighting, unchanged." }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "living_room_edited.png" , buffer ); console . log ( "Image saved as living_room_edited.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imagePath := "/path/to/your/living_room.png" imgData , _ := os . ReadFile ( imagePath ) parts := [] * genai . Part { & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData , }, }, genai . NewPartFromText ( "Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa. Keep the rest of the room, including the pillows on the sofa and the lighting, unchanged." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "living_room_edited.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH = /path/to/your/living_room.png if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG_BASE64 \" } }, {\"text\": \"Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa. Keep the rest of the room, including the pillows on the sofa and the lighting, unchanged.\"} ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > living_room_edited.png Input Output A wide shot of a modern, well-lit living room... Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa... 3. Style transfer Provide an image and ask the model to recreate its content in a different artistic style. Template Transform the provided photograph of [subject] into the artistic style of [artist/art style]. Preserve the original composition but render it with [description of stylistic elements]. Prompt "Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows." Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Base image prompt: "A photorealistic, high-resolution photograph of a busy city street in New York at night, with bright neon signs, yellow taxis, and tall skyscrapers." city_image = Image . open ( '/path/to/your/city.png' ) text_input = """Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows.""" # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ city_image , text_input ], ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'city_style_transfer.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath = "/path/to/your/city.png" ; const imageData = fs . readFileSync ( imagePath ); const base64Image = imageData . toString ( "base64" ); const prompt = [ { inlineData : { mimeType : "image/png" , data : base64Image , }, }, { text : "Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows." }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "city_style_transfer.png" , buffer ); console . log ( "Image saved as city_style_transfer.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imagePath := "/path/to/your/city.png" imgData , _ := os . ReadFile ( imagePath ) parts := [] * genai . Part { & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData , }, }, genai . NewPartFromText ( "Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "city_style_transfer.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH = /path/to/your/city.png if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG_BASE64 \" } }, {\"text\": \"Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows.\"} ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > city_style_transfer.png Input Output A photorealistic, high-resolution photograph of a busy city street... Transform the provided photograph of a modern city street at night... 4. Advanced composition: Combining multiple images Provide multiple images as context to create a new, composite scene. This is perfect for product mockups or creative collages. Template Create a new image by combining the elements from the provided images. Take the [element from image 1] and place it with/on the [element from image 2]. The final image should be a [description of the final scene]. Prompt "Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment." Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Base image prompts: # 1. Dress: "A professionally shot photo of a blue floral summer dress on a plain white background, ghost mannequin style." # 2. Model: "Full-body shot of a woman with her hair in a bun, smiling, standing against a neutral grey studio background." dress_image = Image . open ( '/path/to/your/dress.png' ) model_image = Image . open ( '/path/to/your/model.png' ) text_input = """Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment.""" # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ dress_image , model_image , text_input ], ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'fashion_ecommerce_shot.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath1 = "/path/to/your/dress.png" ; const imageData1 = fs . readFileSync ( imagePath1 ); const base64Image1 = imageData1 . toString ( "base64" ); const imagePath2 = "/path/to/your/model.png" ; const imageData2 = fs . readFileSync ( imagePath2 ); const base64Image2 = imageData2 . toString ( "base64" ); const prompt = [ { inlineData : { mimeType : "image/png" , data : base64Image1 , }, }, { inlineData : { mimeType : "image/png" , data : base64Image2 , }, }, { text : "Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment." }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "fashion_ecommerce_shot.png" , buffer ); console . log ( "Image saved as fashion_ecommerce_shot.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imgData1 , _ := os . ReadFile ( "/path/to/your/dress.png" ) imgData2 , _ := os . ReadFile ( "/path/to/your/model.png" ) parts := [] * genai . Part { & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData1 , }, }, & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData2 , }, }, genai . NewPartFromText ( "Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "fashion_ecommerce_shot.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH1 = /path/to/your/dress.png IMG_PATH2 = /path/to/your/model.png if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG1_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH1 " 2>&1 ) IMG2_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH2 " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG1_BASE64 \" } }, { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG2_BASE64 \" } }, {\"text\": \"Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment.\"} ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > fashion_ecommerce_shot.png Input 1 Input 2 Output A professionally shot photo of a blue floral summer dress... Full-body shot of a woman with her hair in a bun... Create a professional e-commerce fashion photo... 5. High-fidelity detail preservation To ensure critical details (like a face or logo) are preserved during an edit, describe them in great detail along with your edit request. Template Using the provided images, place [element from image 2] onto [element from image 1]. Ensure that the features of [element from image 1] remain completely unchanged. The added element should [description of how the element should integrate]. Prompt "Take the first image of the woman with brown hair, blue eyes, and a neutral expression. Add the logo from the second image onto her black t-shirt. Ensure the woman's face and features remain completely unchanged. The logo should look like it's naturally printed on the fabric, following the folds of the shirt." Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Base image prompts: # 1. Woman: "A professional headshot of a woman with brown hair and blue eyes, wearing a plain black t-shirt, against a neutral studio background." # 2. Logo: "A simple, modern logo with the letters 'G' and 'A' in a white circle." woman_image = Image . open ( '/path/to/your/woman.png' ) logo_image = Image . open ( '/path/to/your/logo.png' ) text_input = """Take the first image of the woman with brown hair, blue eyes, and a neutral expression. Add the logo from the second image onto her black t-shirt. Ensure the woman's face and features remain completely unchanged. The logo should look like it's naturally printed on the fabric, following the folds of the shirt.""" # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ woman_image , logo_image , text_input ], ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'woman_with_logo.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath1 = "/path/to/your/woman.png" ; const imageData1 = fs . readFileSync ( imagePath1 ); const base64Image1 = imageData1 . toString ( "base64" ); const imagePath2 = "/path/to/your/logo.png" ; const imageData2 = fs . readFileSync ( imagePath2 ); const base64Image2 = imageData2 . toString ( "base64" ); const prompt = [ { inlineData : { mimeType : "image/png" , data : base64Image1 , }, }, { inlineData : { mimeType : "image/png" , data : base64Image2 , }, }, { text : "Take the first image of the woman with brown hair, blue eyes, and a neutral expression. Add the logo from the second image onto her black t-shirt. Ensure the woman's face and features remain completely unchanged. The logo should look like it's naturally printed on the fabric, following the folds of the shirt." }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "woman_with_logo.png" , buffer ); console . log ( "Image saved as woman_with_logo.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imgData1 , _ := os . ReadFile ( "/path/to/your/woman.png" ) imgData2 , _ := os . ReadFile ( "/path/to/your/logo.png" ) parts := [] * genai . Part { & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData1 , }, }, & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData2 , }, }, genai . NewPartFromText ( "Take the first image of the woman with brown hair, blue eyes, and a neutral expression. Add the logo from the second image onto her black t-shirt. Ensure the woman's face and features remain completely unchanged. The logo should look like it's naturally printed on the fabric, following the folds of the shirt." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "woman_with_logo.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH1 = /path/to/your/woman.png IMG_PATH2 = /path/to/your/logo.png if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG1_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH1 " 2>&1 ) IMG2_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH2 " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG1_BASE64 \" } }, { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG2_BASE64 \" } }, {\"text\": \"Take the first image of the woman with brown hair, blue eyes, and a neutral expression. Add the logo from the second image onto her black t-shirt. Ensure the woman's face and features remain completely unchanged. The logo should look like it's naturally printed on the fabric, following the folds of the shirt.\"} ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > woman_with_logo.png Input 1 Input 2 Output A professional headshot of a woman with brown hair and blue eyes... A simple, modern logo with the letters 'G' and 'A'... Take the first image of the woman with brown hair, blue eyes, and a neutral expression... Best Practices To elevate your results from good to great, incorporate these professional strategies into your workflow. Be Hyper-Specific: The more detail you provide, the more control you have. Instead of "fantasy armor," describe it: "ornate elven plate armor, etched with silver leaf patterns, with a high collar and pauldrons shaped like falcon wings." Provide Context and Intent: Explain the purpose of the image. The model's understanding of context will influence the final output. For example, "Create a logo for a high-end, minimalist skincare brand" will yield better results than just "Create a logo." Iterate and Refine: Don't expect a perfect image on the first try. Use the conversational nature of the model to make small changes. Follow up with prompts like, "That's great, but can you make the lighting a bit warmer?" or "Keep everything the same, but change the character's expression to be more serious." Use Step-by-Step Instructions: For complex scenes with many elements, break your prompt into steps. "First, create a background of a serene, misty forest at dawn. Then, in the foreground, add a moss-covered ancient stone altar. Finally, place a single, glowing sword on top of the altar." Use "Semantic Negative Prompts": Instead of saying "no cars," describe the desired scene positively: "an empty, deserted street with no signs of traffic." Control the Camera: Use photographic and cinematic language to control the composition. Terms like wide-angle shot , macro shot , low-angle perspective . Limitations For best performance, use the following languages: EN, es-MX, ja-JP, zh-CN, hi-IN. Image generation does not support audio or video inputs. The model won't always follow the exact number of image outputs that the user explicitly asked for. The model works best with up to 3 images as an input. When generating text for an image, Gemini works best if you first generate the text and then ask for an image with the text. Uploading images of children is not currently supported in EEA, CH, and UK. All generated images include a SynthID watermark . When to use Imagen In addition to using Gemini's built-in image generation capabilities, you can also access Imagen , our specialized image generation model, through the Gemini API. Attribute Imagen Gemini Native Image Strengths Most capable image generation model to date. Recommended for photorealistic images, sharper clarity, improved spelling and typography. Default recommendation. Unparalleled flexibility, contextual understanding, and simple, mask-free editing. Uniquely capable of multi-turn conversational editing. Availability Generally available Preview (Production usage allowed) Latency Low . Optimized for near-real-time performance. Higher. More computation is required for its advanced capabilities. Cost Cost-effective for specialized tasks. $0.02/image to $0.12/image Token-based pricing. $30 per 1 million tokens for image output (image output tokenized at 1290 tokens per image flat, up to 1024x1024px) Recommended tasks Image quality, photorealism, artistic detail, or specific styles (e.g., impressionism, anime) are top priorities. Infusing branding, style, or generating logos and product designs. Generating advanced spelling or typography. Interleaved text and image generation to seamlessly blend text and images. Combine creative elements from multiple images with a single prompt. Make highly specific edits to images, modify individual elements with simple language commands, and iteratively work on an image. Apply a specific design or texture from one image to another while preserving the original subject's form and details. Imagen 4 should be your go-to model starting to generate images with Imagen. Choose Imagen 4 Ultra for advanced use-cases or when you need the best image quality (note that can only generate one image at a time). What's next Find more examples and code samples in the cookbook guide . Check out the Veo guide to learn how to generate videos with the Gemini API. To learn more about Gemini models, see Gemini models . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-09-08 UTC.
text_content/docs_image-generation_94c9fcfa.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ URL: https://ai.google.dev/gemini-api/docs/image-generation#image_generation_text-to-image
2
+ Title: Image generation with Gemini (aka Nano Banana)  |  Gemini API  |  Google AI for Developers
3
+ ==================================================
4
+
5
+ Image generation with Gemini (aka Nano Banana) | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Image generation with Gemini (aka Nano Banana) Gemini can generate and process images conversationally. You can prompt Gemini with text, images, or a combination of both allowing you to create, edit, and iterate on visuals with unprecedented control: Text-to-Image: Generate high-quality images from simple or complex text descriptions. Image + Text-to-Image (Editing): Provide an image and use text prompts to add, remove, or modify elements, change the style, or adjust the color grading. Multi-Image to Image (Composition & Style Transfer): Use multiple input images to compose a new scene or transfer the style from one image to another. Iterative Refinement: Engage in a conversation to progressively refine your image over multiple turns, making small adjustments until it's perfect. High-Fidelity Text Rendering: Accurately generate images that contain legible and well-placed text, ideal for logos, diagrams, and posters. All generated images include a SynthID watermark . Image generation (text-to-image) The following code demonstrates how to generate an image based on a descriptive prompt. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () prompt = ( "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme" ) response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ prompt ], ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) elif part . inline_data is not None : image = Image . open ( BytesIO ( part . inline_data . data )) image . save ( "generated_image.png" ) JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme" ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "gemini-native-image.png" , buffer ); console . log ( "Image saved as gemini-native-image.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "Create a picture of a nano banana dish in a " + " fancy restaurant with a Gemini theme" ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "gemini_generated_image.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme"} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > gemini-native-image.png AI-generated image of a nano banana dish in a Gemini-themed restaurant Image editing (text-and-image-to-image) Reminder : Make sure you have the necessary rights to any images you upload. Don't generate content that infringe on others' rights, including videos or images that deceive, harass, or harm. Your use of this generative AI service is subject to our Prohibited Use Policy . The following example demonstrates uploading base64 encoded images. For multiple images, larger payloads, and supported MIME types, check the Image understanding page. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () prompt = ( "Create a picture of my cat eating a nano-banana in a " "fancy restaurant under the Gemini constellation" , ) image = Image . open ( "/path/to/cat_image.png" ) response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ prompt , image ], ) for part in response . candidates [ 0 ] . content . parts : if part . text is not None : print ( part . text ) elif part . inline_data is not None : image = Image . open ( BytesIO ( part . inline_data . data )) image . save ( "generated_image.png" ) JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath = "path/to/cat_image.png" ; const imageData = fs . readFileSync ( imagePath ); const base64Image = imageData . toString ( "base64" ); const prompt = [ { text : "Create a picture of my cat eating a nano-banana in a" + "fancy restaurant under the Gemini constellation" }, { inlineData : { mimeType : "image/png" , data : base64Image , }, }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "gemini-native-image.png" , buffer ); console . log ( "Image saved as gemini-native-image.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imagePath := "/path/to/cat_image.png" imgData , _ := os . ReadFile ( imagePath ) parts := [] * genai . Part { genai . NewPartFromText ( "Create a picture of my cat eating a nano-banana in a fancy restaurant under the Gemini constellation" ), & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData , }, }, } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "gemini_generated_image.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH = /path/to/cat_image.jpeg if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ {\"text\": \"'Create a picture of my cat eating a nano-banana in a fancy restaurant under the Gemini constellation\"}, { \"inline_data\": { \"mime_type\":\"image/jpeg\", \"data\": \" $IMG_BASE64 \" } } ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > gemini-edited-image.png AI-generated image of a cat eating a nano banana Other image generation modes Gemini supports other image interaction modes based on prompt structure and context, including: Text to image(s) and text (interleaved): Outputs images with related text. Example prompt: "Generate an illustrated recipe for a paella." Image(s) and text to image(s) and text (interleaved) : Uses input images and text to create new related images and text. Example prompt: (With an image of a furnished room) "What other color sofas would work in my space? can you update the image?" Multi-turn image editing (chat): Keep generating and editing images conversationally. Example prompts: [upload an image of a blue car.] , "Turn this car into a convertible.", "Now change the color to yellow." Prompting guide and strategies Mastering Gemini 2.5 Flash Image Generation starts with one fundamental principle: Describe the scene, don't just list keywords. The model's core strength is its deep language understanding. A narrative, descriptive paragraph will almost always produce a better, more coherent image than a list of disconnected words. Prompts for generating images The following strategies will help you create effective prompts to generate exactly the images you're looking for. 1. Photorealistic scenes For realistic images, use photography terms. Mention camera angles, lens types, lighting, and fine details to guide the model toward a photorealistic result. Template A photorealistic [shot type] of [subject], [action or expression], set in [environment]. The scene is illuminated by [lighting description], creating a [mood] atmosphere. Captured with a [camera/lens details], emphasizing [key textures and details]. The image should be in a [aspect ratio] format. Prompt A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful. Vertical portrait orientation. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop with pottery wheels and shelves of clay pots in the background. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay and the fabric of his apron. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'photorealistic_example.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop with pottery wheels and shelves of clay pots in the background. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay and the fabric of his apron. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "photorealistic_example.png" , buffer ); console . log ( "Image saved as photorealistic_example.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop with pottery wheels and shelves of clay pots in the background. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay and the fabric of his apron. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "photorealistic_example.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "A photorealistic close-up portrait of an elderly Japanese ceramicist with deep, sun-etched wrinkles and a warm, knowing smile. He is carefully inspecting a freshly glazed tea bowl. The setting is his rustic, sun-drenched workshop with pottery wheels and shelves of clay pots in the background. The scene is illuminated by soft, golden hour light streaming through a window, highlighting the fine texture of the clay and the fabric of his apron. Captured with an 85mm portrait lens, resulting in a soft, blurred background (bokeh). The overall mood is serene and masterful."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > photorealistic_example.png A photorealistic close-up portrait of an elderly Japanese ceramicist... 2. Stylized illustrations & stickers To create stickers, icons, or assets, be explicit about the style and request a transparent background. Template A [style] sticker of a [subject], featuring [key characteristics] and a [color palette]. The design should have [line style] and [shading style]. The background must be transparent. Prompt A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It's munching on a green bamboo leaf. The design features bold, clean outlines, simple cel-shading, and a vibrant color palette. The background must be white. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It's munching on a green bamboo leaf. The design features bold, clean outlines, simple cel-shading, and a vibrant color palette. The background must be white." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'red_panda_sticker.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It's munching on a green bamboo leaf. The design features bold, clean outlines, simple cel-shading, and a vibrant color palette. The background must be white." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "red_panda_sticker.png" , buffer ); console . log ( "Image saved as red_panda_sticker.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It's munching on a green bamboo leaf. The design features bold, clean outlines, simple cel-shading, and a vibrant color palette. The background must be white." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "red_panda_sticker.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "A kawaii-style sticker of a happy red panda wearing a tiny bamboo hat. It' "'" 's munching on a green bamboo leaf. The design features bold, clean outlines, simple cel-shading, and a vibrant color palette. The background must be white."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > red_panda_sticker.png A kawaii-style sticker of a happy red panda... 3. Accurate text in images Gemini excels at rendering text. Be clear about the text, the font style (descriptively), and the overall design. Template Create a [image type] for [brand/concept] with the text "[text to render]" in a [font style]. The design should be [style description], with a [color scheme]. Prompt Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a a coffee bean seamlessly integrated with the text. The color scheme is black and white. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a a coffee bean seamlessly integrated with the text. The color scheme is black and white." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'logo_example.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a a coffee bean seamlessly integrated with the text. The color scheme is black and white." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "logo_example.png" , buffer ); console . log ( "Image saved as logo_example.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a a coffee bean seamlessly integrated with the text. The color scheme is black and white." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "logo_example.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "Create a modern, minimalist logo for a coffee shop called ' "'" 'The Daily Grind' "'" '. The text should be in a clean, bold, sans-serif font. The design should feature a simple, stylized icon of a a coffee bean seamlessly integrated with the text. The color scheme is black and white."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > logo_example.png Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'... 4. Product mockups & commercial photography Perfect for creating clean, professional product shots for e-commerce, advertising, or branding. Template A high-resolution, studio-lit product photograph of a [product description] on a [background surface/description]. The lighting is a [lighting setup, e.g., three-point softbox setup] to [lighting purpose]. The camera angle is a [angle type] to showcase [specific feature]. Ultra-realistic, with sharp focus on [key detail]. [Aspect ratio]. Prompt A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black, presented on a polished concrete surface. The lighting is a three-point softbox setup designed to create soft, diffused highlights and eliminate harsh shadows. The camera angle is a slightly elevated 45-degree shot to showcase its clean lines. Ultra-realistic, with sharp focus on the steam rising from the coffee. Square image. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black, presented on a polished concrete surface. The lighting is a three-point softbox setup designed to create soft, diffused highlights and eliminate harsh shadows. The camera angle is a slightly elevated 45-degree shot to showcase its clean lines. Ultra-realistic, with sharp focus on the steam rising from the coffee. Square image." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'product_mockup.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black, presented on a polished concrete surface. The lighting is a three-point softbox setup designed to create soft, diffused highlights and eliminate harsh shadows. The camera angle is a slightly elevated 45-degree shot to showcase its clean lines. Ultra-realistic, with sharp focus on the steam rising from the coffee. Square image." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "product_mockup.png" , buffer ); console . log ( "Image saved as product_mockup.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black, presented on a polished concrete surface. The lighting is a three-point softbox setup designed to create soft, diffused highlights and eliminate harsh shadows. The camera angle is a slightly elevated 45-degree shot to showcase its clean lines. Ultra-realistic, with sharp focus on the steam rising from the coffee. Square image." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "product_mockup.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug in matte black, presented on a polished concrete surface. The lighting is a three-point softbox setup designed to create soft, diffused highlights and eliminate harsh shadows. The camera angle is a slightly elevated 45-degree shot to showcase its clean lines. Ultra-realistic, with sharp focus on the steam rising from the coffee. Square image."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > product_mockup.png A high-resolution, studio-lit product photograph of a minimalist ceramic coffee mug... 5. Minimalist & negative space design Excellent for creating backgrounds for websites, presentations, or marketing materials where text will be overlaid. Template A minimalist composition featuring a single [subject] positioned in the [bottom-right/top-left/etc.] of the frame. The background is a vast, empty [color] canvas, creating significant negative space. Soft, subtle lighting. [Aspect ratio]. Prompt A minimalist composition featuring a single, delicate red maple leaf positioned in the bottom-right of the frame. The background is a vast, empty off-white canvas, creating significant negative space for text. Soft, diffused lighting from the top left. Square image. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "A minimalist composition featuring a single, delicate red maple leaf positioned in the bottom-right of the frame. The background is a vast, empty off-white canvas, creating significant negative space for text. Soft, diffused lighting from the top left. Square image." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'minimalist_design.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "A minimalist composition featuring a single, delicate red maple leaf positioned in the bottom-right of the frame. The background is a vast, empty off-white canvas, creating significant negative space for text. Soft, diffused lighting from the top left. Square image." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "minimalist_design.png" , buffer ); console . log ( "Image saved as minimalist_design.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "A minimalist composition featuring a single, delicate red maple leaf positioned in the bottom-right of the frame. The background is a vast, empty off-white canvas, creating significant negative space for text. Soft, diffused lighting from the top left. Square image." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "minimalist_design.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "A minimalist composition featuring a single, delicate red maple leaf positioned in the bottom-right of the frame. The background is a vast, empty off-white canvas, creating significant negative space for text. Soft, diffused lighting from the top left. Square image."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > minimalist_design.png A minimalist composition featuring a single, delicate red maple leaf... 6. Sequential art (Comic panel / Storyboard) Builds on character consistency and scene description to create panels for visual storytelling. Template A single comic book panel in a [art style] style. In the foreground, [character description and action]. In the background, [setting details]. The panel has a [dialogue/caption box] with the text "[Text]". The lighting creates a [mood] mood. [Aspect ratio]. Prompt A single comic book panel in a gritty, noir art style with high-contrast black and white inks. In the foreground, a detective in a trench coat stands under a flickering streetlamp, rain soaking his shoulders. In the background, the neon sign of a desolate bar reflects in a puddle. A caption box at the top reads "The city was a tough place to keep secrets." The lighting is harsh, creating a dramatic, somber mood. Landscape. Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = "A single comic book panel in a gritty, noir art style with high-contrast black and white inks. In the foreground, a detective in a trench coat stands under a flickering streetlamp, rain soaking his shoulders. In the background, the neon sign of a desolate bar reflects in a puddle. A caption box at the top reads \" The city was a tough place to keep secrets. \" The lighting is harsh, creating a dramatic, somber mood. Landscape." , ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'comic_panel.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const prompt = "A single comic book panel in a gritty, noir art style with high-contrast black and white inks. In the foreground, a detective in a trench coat stands under a flickering streetlamp, rain soaking his shoulders. In the background, the neon sign of a desolate bar reflects in a puddle. A caption box at the top reads \"The city was a tough place to keep secrets.\" The lighting is harsh, creating a dramatic, somber mood. Landscape." ; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "comic_panel.png" , buffer ); console . log ( "Image saved as comic_panel.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , genai . Text ( "A single comic book panel in a gritty, noir art style with high-contrast black and white inks. In the foreground, a detective in a trench coat stands under a flickering streetlamp, rain soaking his shoulders. In the background, the neon sign of a desolate bar reflects in a puddle. A caption box at the top reads \"The city was a tough place to keep secrets.\" The lighting is harsh, creating a dramatic, somber mood. Landscape." ), ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "comic_panel.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Type: application/json" \ -d '{ "contents": [{ "parts": [ {"text": "A single comic book panel in a gritty, noir art style with high-contrast black and white inks. In the foreground, a detective in a trench coat stands under a flickering streetlamp, rain soaking his shoulders. In the background, the neon sign of a desolate bar reflects in a puddle. A caption box at the top reads \"The city was a tough place to keep secrets.\" The lighting is harsh, creating a dramatic, somber mood. Landscape."} ] }] }' \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > comic_panel.png A single comic book panel in a gritty, noir art style... Prompts for editing images These examples show how to provide images alongside your text prompts for editing, composition, and style transfer. 1. Adding and removing elements Provide an image and describe your change. The model will match the original image's style, lighting, and perspective. Template Using the provided image of [subject], please [add/remove/modify] [element] to/from the scene. Ensure the change is [description of how the change should integrate]. Prompt "Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and matches the soft lighting of the photo." Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Base image prompt: "A photorealistic picture of a fluffy ginger cat sitting on a wooden floor, looking directly at the camera. Soft, natural light from a window." image_input = Image . open ( '/path/to/your/cat_photo.png' ) text_input = """Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and not falling off.""" # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ text_input , image_input ], ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'cat_with_hat.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath = "/path/to/your/cat_photo.png" ; const imageData = fs . readFileSync ( imagePath ); const base64Image = imageData . toString ( "base64" ); const prompt = [ { text : "Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and not falling off." }, { inlineData : { mimeType : "image/png" , data : base64Image , }, }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "cat_with_hat.png" , buffer ); console . log ( "Image saved as cat_with_hat.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imagePath := "/path/to/your/cat_photo.png" imgData , _ := os . ReadFile ( imagePath ) parts := [] * genai . Part { genai . NewPartFromText ( "Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and not falling off." ), & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData , }, }, } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "cat_with_hat.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH = /path/to/your/cat_photo.png if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ {\"text\": \"Using the provided image of my cat, please add a small, knitted wizard hat on its head. Make it look like it's sitting comfortably and not falling off.\"}, { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG_BASE64 \" } } ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > cat_with_hat.png Input Output A photorealistic picture of a fluffy ginger cat... Using the provided image of my cat, please add a small, knitted wizard hat... 2. Inpainting (Semantic masking) Conversationally define a "mask" to edit a specific part of an image while leaving the rest untouched. Template Using the provided image, change only the [specific element] to [new element/description]. Keep everything else in the image exactly the same, preserving the original style, lighting, and composition. Prompt "Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa. Keep the rest of the room, including the pillows on the sofa and the lighting, unchanged." Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Base image prompt: "A wide shot of a modern, well-lit living room with a prominent blue sofa in the center. A coffee table is in front of it and a large window is in the background." living_room_image = Image . open ( '/path/to/your/living_room.png' ) text_input = """Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa. Keep the rest of the room, including the pillows on the sofa and the lighting, unchanged.""" # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ living_room_image , text_input ], ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'living_room_edited.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath = "/path/to/your/living_room.png" ; const imageData = fs . readFileSync ( imagePath ); const base64Image = imageData . toString ( "base64" ); const prompt = [ { inlineData : { mimeType : "image/png" , data : base64Image , }, }, { text : "Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa. Keep the rest of the room, including the pillows on the sofa and the lighting, unchanged." }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "living_room_edited.png" , buffer ); console . log ( "Image saved as living_room_edited.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imagePath := "/path/to/your/living_room.png" imgData , _ := os . ReadFile ( imagePath ) parts := [] * genai . Part { & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData , }, }, genai . NewPartFromText ( "Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa. Keep the rest of the room, including the pillows on the sofa and the lighting, unchanged." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "living_room_edited.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH = /path/to/your/living_room.png if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG_BASE64 \" } }, {\"text\": \"Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa. Keep the rest of the room, including the pillows on the sofa and the lighting, unchanged.\"} ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > living_room_edited.png Input Output A wide shot of a modern, well-lit living room... Using the provided image of a living room, change only the blue sofa to be a vintage, brown leather chesterfield sofa... 3. Style transfer Provide an image and ask the model to recreate its content in a different artistic style. Template Transform the provided photograph of [subject] into the artistic style of [artist/art style]. Preserve the original composition but render it with [description of stylistic elements]. Prompt "Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows." Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Base image prompt: "A photorealistic, high-resolution photograph of a busy city street in New York at night, with bright neon signs, yellow taxis, and tall skyscrapers." city_image = Image . open ( '/path/to/your/city.png' ) text_input = """Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows.""" # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ city_image , text_input ], ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'city_style_transfer.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath = "/path/to/your/city.png" ; const imageData = fs . readFileSync ( imagePath ); const base64Image = imageData . toString ( "base64" ); const prompt = [ { inlineData : { mimeType : "image/png" , data : base64Image , }, }, { text : "Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows." }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "city_style_transfer.png" , buffer ); console . log ( "Image saved as city_style_transfer.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imagePath := "/path/to/your/city.png" imgData , _ := os . ReadFile ( imagePath ) parts := [] * genai . Part { & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData , }, }, genai . NewPartFromText ( "Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "city_style_transfer.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH = /path/to/your/city.png if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG_BASE64 \" } }, {\"text\": \"Transform the provided photograph of a modern city street at night into the artistic style of Vincent van Gogh's 'Starry Night'. Preserve the original composition of buildings and cars, but render all elements with swirling, impasto brushstrokes and a dramatic palette of deep blues and bright yellows.\"} ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > city_style_transfer.png Input Output A photorealistic, high-resolution photograph of a busy city street... Transform the provided photograph of a modern city street at night... 4. Advanced composition: Combining multiple images Provide multiple images as context to create a new, composite scene. This is perfect for product mockups or creative collages. Template Create a new image by combining the elements from the provided images. Take the [element from image 1] and place it with/on the [element from image 2]. The final image should be a [description of the final scene]. Prompt "Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment." Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Base image prompts: # 1. Dress: "A professionally shot photo of a blue floral summer dress on a plain white background, ghost mannequin style." # 2. Model: "Full-body shot of a woman with her hair in a bun, smiling, standing against a neutral grey studio background." dress_image = Image . open ( '/path/to/your/dress.png' ) model_image = Image . open ( '/path/to/your/model.png' ) text_input = """Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment.""" # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ dress_image , model_image , text_input ], ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'fashion_ecommerce_shot.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath1 = "/path/to/your/dress.png" ; const imageData1 = fs . readFileSync ( imagePath1 ); const base64Image1 = imageData1 . toString ( "base64" ); const imagePath2 = "/path/to/your/model.png" ; const imageData2 = fs . readFileSync ( imagePath2 ); const base64Image2 = imageData2 . toString ( "base64" ); const prompt = [ { inlineData : { mimeType : "image/png" , data : base64Image1 , }, }, { inlineData : { mimeType : "image/png" , data : base64Image2 , }, }, { text : "Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment." }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "fashion_ecommerce_shot.png" , buffer ); console . log ( "Image saved as fashion_ecommerce_shot.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imgData1 , _ := os . ReadFile ( "/path/to/your/dress.png" ) imgData2 , _ := os . ReadFile ( "/path/to/your/model.png" ) parts := [] * genai . Part { & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData1 , }, }, & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData2 , }, }, genai . NewPartFromText ( "Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "fashion_ecommerce_shot.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH1 = /path/to/your/dress.png IMG_PATH2 = /path/to/your/model.png if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG1_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH1 " 2>&1 ) IMG2_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH2 " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG1_BASE64 \" } }, { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG2_BASE64 \" } }, {\"text\": \"Create a professional e-commerce fashion photo. Take the blue floral dress from the first image and let the woman from the second image wear it. Generate a realistic, full-body shot of the woman wearing the dress, with the lighting and shadows adjusted to match the outdoor environment.\"} ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > fashion_ecommerce_shot.png Input 1 Input 2 Output A professionally shot photo of a blue floral summer dress... Full-body shot of a woman with her hair in a bun... Create a professional e-commerce fashion photo... 5. High-fidelity detail preservation To ensure critical details (like a face or logo) are preserved during an edit, describe them in great detail along with your edit request. Template Using the provided images, place [element from image 2] onto [element from image 1]. Ensure that the features of [element from image 1] remain completely unchanged. The added element should [description of how the element should integrate]. Prompt "Take the first image of the woman with brown hair, blue eyes, and a neutral expression. Add the logo from the second image onto her black t-shirt. Ensure the woman's face and features remain completely unchanged. The logo should look like it's naturally printed on the fabric, following the folds of the shirt." Python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai . Client () # Base image prompts: # 1. Woman: "A professional headshot of a woman with brown hair and blue eyes, wearing a plain black t-shirt, against a neutral studio background." # 2. Logo: "A simple, modern logo with the letters 'G' and 'A' in a white circle." woman_image = Image . open ( '/path/to/your/woman.png' ) logo_image = Image . open ( '/path/to/your/logo.png' ) text_input = """Take the first image of the woman with brown hair, blue eyes, and a neutral expression. Add the logo from the second image onto her black t-shirt. Ensure the woman's face and features remain completely unchanged. The logo should look like it's naturally printed on the fabric, following the folds of the shirt.""" # Generate an image from a text prompt response = client . models . generate_content ( model = "gemini-2.5-flash-image-preview" , contents = [ woman_image , logo_image , text_input ], ) image_parts = [ part . inline_data . data for part in response . candidates [ 0 ] . content . parts if part . inline_data ] if image_parts : image = Image . open ( BytesIO ( image_parts [ 0 ])) image . save ( 'woman_with_logo.png' ) image . show () JavaScript import { GoogleGenAI , Modality } from "@google/genai" ; import * as fs from "node:fs" ; async function main () { const ai = new GoogleGenAI ({}); const imagePath1 = "/path/to/your/woman.png" ; const imageData1 = fs . readFileSync ( imagePath1 ); const base64Image1 = imageData1 . toString ( "base64" ); const imagePath2 = "/path/to/your/logo.png" ; const imageData2 = fs . readFileSync ( imagePath2 ); const base64Image2 = imageData2 . toString ( "base64" ); const prompt = [ { inlineData : { mimeType : "image/png" , data : base64Image1 , }, }, { inlineData : { mimeType : "image/png" , data : base64Image2 , }, }, { text : "Take the first image of the woman with brown hair, blue eyes, and a neutral expression. Add the logo from the second image onto her black t-shirt. Ensure the woman's face and features remain completely unchanged. The logo should look like it's naturally printed on the fabric, following the folds of the shirt." }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash-image-preview" , contents : prompt , }); for ( const part of response . candidates [ 0 ]. content . parts ) { if ( part . text ) { console . log ( part . text ); } else if ( part . inlineData ) { const imageData = part . inlineData . data ; const buffer = Buffer . from ( imageData , "base64" ); fs . writeFileSync ( "woman_with_logo.png" , buffer ); console . log ( "Image saved as woman_with_logo.png" ); } } } main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imgData1 , _ := os . ReadFile ( "/path/to/your/woman.png" ) imgData2 , _ := os . ReadFile ( "/path/to/your/logo.png" ) parts := [] * genai . Part { & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData1 , }, }, & genai . Part { InlineData : & genai . Blob { MIMEType : "image/png" , Data : imgData2 , }, }, genai . NewPartFromText ( "Take the first image of the woman with brown hair, blue eyes, and a neutral expression. Add the logo from the second image onto her black t-shirt. Ensure the woman's face and features remain completely unchanged. The logo should look like it's naturally printed on the fabric, following the folds of the shirt." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash-image-preview" , contents , ) for _ , part := range result . Candidates [ 0 ]. Content . Parts { if part . Text != "" { fmt . Println ( part . Text ) } else if part . InlineData != nil { imageBytes := part . InlineData . Data outputFilename := "woman_with_logo.png" _ = os . WriteFile ( outputFilename , imageBytes , 0644 ) } } } REST IMG_PATH1 = /path/to/your/woman.png IMG_PATH2 = /path/to/your/logo.png if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMG1_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH1 " 2>&1 ) IMG2_BASE64 = $( base64 " $B64FLAGS " " $IMG_PATH2 " 2>&1 ) curl -X POST \ "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image-preview:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -d "{ \"contents\": [{ \"parts\":[ { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG1_BASE64 \" } }, { \"inline_data\": { \"mime_type\":\"image/png\", \"data\": \" $IMG2_BASE64 \" } }, {\"text\": \"Take the first image of the woman with brown hair, blue eyes, and a neutral expression. Add the logo from the second image onto her black t-shirt. Ensure the woman's face and features remain completely unchanged. The logo should look like it's naturally printed on the fabric, following the folds of the shirt.\"} ] }] }" \ | grep -o '"data": "[^"]*"' \ | cut -d '"' -f4 \ | base64 --decode > woman_with_logo.png Input 1 Input 2 Output A professional headshot of a woman with brown hair and blue eyes... A simple, modern logo with the letters 'G' and 'A'... Take the first image of the woman with brown hair, blue eyes, and a neutral expression... Best Practices To elevate your results from good to great, incorporate these professional strategies into your workflow. Be Hyper-Specific: The more detail you provide, the more control you have. Instead of "fantasy armor," describe it: "ornate elven plate armor, etched with silver leaf patterns, with a high collar and pauldrons shaped like falcon wings." Provide Context and Intent: Explain the purpose of the image. The model's understanding of context will influence the final output. For example, "Create a logo for a high-end, minimalist skincare brand" will yield better results than just "Create a logo." Iterate and Refine: Don't expect a perfect image on the first try. Use the conversational nature of the model to make small changes. Follow up with prompts like, "That's great, but can you make the lighting a bit warmer?" or "Keep everything the same, but change the character's expression to be more serious." Use Step-by-Step Instructions: For complex scenes with many elements, break your prompt into steps. "First, create a background of a serene, misty forest at dawn. Then, in the foreground, add a moss-covered ancient stone altar. Finally, place a single, glowing sword on top of the altar." Use "Semantic Negative Prompts": Instead of saying "no cars," describe the desired scene positively: "an empty, deserted street with no signs of traffic." Control the Camera: Use photographic and cinematic language to control the composition. Terms like wide-angle shot , macro shot , low-angle perspective . Limitations For best performance, use the following languages: EN, es-MX, ja-JP, zh-CN, hi-IN. Image generation does not support audio or video inputs. The model won't always follow the exact number of image outputs that the user explicitly asked for. The model works best with up to 3 images as an input. When generating text for an image, Gemini works best if you first generate the text and then ask for an image with the text. Uploading images of children is not currently supported in EEA, CH, and UK. All generated images include a SynthID watermark . When to use Imagen In addition to using Gemini's built-in image generation capabilities, you can also access Imagen , our specialized image generation model, through the Gemini API. Attribute Imagen Gemini Native Image Strengths Most capable image generation model to date. Recommended for photorealistic images, sharper clarity, improved spelling and typography. Default recommendation. Unparalleled flexibility, contextual understanding, and simple, mask-free editing. Uniquely capable of multi-turn conversational editing. Availability Generally available Preview (Production usage allowed) Latency Low . Optimized for near-real-time performance. Higher. More computation is required for its advanced capabilities. Cost Cost-effective for specialized tasks. $0.02/image to $0.12/image Token-based pricing. $30 per 1 million tokens for image output (image output tokenized at 1290 tokens per image flat, up to 1024x1024px) Recommended tasks Image quality, photorealism, artistic detail, or specific styles (e.g., impressionism, anime) are top priorities. Infusing branding, style, or generating logos and product designs. Generating advanced spelling or typography. Interleaved text and image generation to seamlessly blend text and images. Combine creative elements from multiple images with a single prompt. Make highly specific edits to images, modify individual elements with simple language commands, and iteratively work on an image. Apply a specific design or texture from one image to another while preserving the original subject's form and details. Imagen 4 should be your go-to model starting to generate images with Imagen. Choose Imagen 4 Ultra for advanced use-cases or when you need the best image quality (note that can only generate one image at a time). What's next Find more examples and code samples in the cookbook guide . Check out the Veo guide to learn how to generate videos with the Gemini API. To learn more about Gemini models, see Gemini models . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-09-08 UTC.
text_content/docs_image-understanding_3665b2dc.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/image-understanding
2
  Title: Image understanding  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Image understanding | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Image understanding Gemini models are built to be multimodal from the ground up, unlocking a wide range of image processing and computer vision tasks including but not limited to image captioning, classification, and visual question answering without having to train specialized ML models. Tip: In addition to their general multimodal capabilities, Gemini models (2.0 and newer) offer improved accuracy for specific use cases like object detection and segmentation , through additional training. See the Capabilities section for more details. Passing images to Gemini You can provide images as input to Gemini using two methods: Passing inline image data : Ideal for smaller files (total request size less than 20MB, including prompts). Uploading images using the File API : Recommended for larger files or for reusing images across multiple requests. Passing inline image data You can pass inline image data in the request to generateContent . You can provide image data as Base64 encoded strings or by reading local files directly (depending on the language). The following example shows how to read an image from a local file and pass it to generateContent API for processing. Python from google.genai import types with open ( 'path/to/small-sample.jpg' , 'rb' ) as f : image_bytes = f . read () response = client . models . generate_content ( model = 'gemini-2.5-flash' , contents = [ types . Part . from_bytes ( data = image_bytes , mime_type = 'image/jpeg' , ), 'Caption this image.' ] ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; import * as fs from "node:fs" ; const ai = new GoogleGenAI ({}); const base64ImageFile = fs . readFileSync ( "path/to/small-sample.jpg" , { encoding : "base64" , }); const contents = [ { inlineData : { mimeType : "image/jpeg" , data : base64ImageFile , }, }, { text : "Caption this image." }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : contents , }); console . log ( response . text ); Go bytes , _ := os . ReadFile ( "path/to/small-sample.jpg" ) parts := [] * genai . Part { genai . NewPartFromBytes ( bytes , "image/jpeg" ), genai . NewPartFromText ( "Caption this image." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) REST IMG_PATH = "/path/to/your/image1.jpg" if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ { "inline_data": { "mime_type":"image/jpeg", "data": "' " $( base64 $B64FLAGS $IMG_PATH ) " '" } }, {"text": "Caption this image."}, ] }] }' 2 > /dev/null You can also fetch an image from a URL, convert it to bytes, and pass it to generateContent as shown in the following examples. Python from google import genai from google.genai import types import requests image_path = "https://goo.gle/instrument-img" image_bytes = requests . get ( image_path ) . content image = types . Part . from_bytes ( data = image_bytes , mime_type = "image/jpeg" ) client = genai . Client () response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ "What is this image?" , image ], ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; async function main () { const ai = new GoogleGenAI ({}); const imageUrl = "https://goo.gle/instrument-img" ; const response = await fetch ( imageUrl ); const imageArrayBuffer = await response . arrayBuffer (); const base64ImageData = Buffer . from ( imageArrayBuffer ). toString ( 'base64' ); const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : [ { inlineData : { mimeType : 'image/jpeg' , data : base64ImageData , }, }, { text : "Caption this image." } ], }); console . log ( result . text ); } main (); Go package main import ( "context" "fmt" "os" "io" "net/http" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } // Download the image. imageResp , _ := http . Get ( "https://goo.gle/instrument-img" ) imageBytes , _ := io . ReadAll ( imageResp . Body ) parts := [] * genai . Part { genai . NewPartFromBytes ( imageBytes , "image/jpeg" ), genai . NewPartFromText ( "Caption this image." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST IMG_URL = "https://goo.gle/instrument-img" MIME_TYPE = $( curl -sIL " $IMG_URL " | grep -i '^content-type:' | awk -F ': ' '{print $2}' | sed 's/\r$//' | head -n 1 ) if [[ -z " $MIME_TYPE " || ! " $MIME_TYPE " == image/* ]] ; then MIME_TYPE = "image/jpeg" fi # Check for macOS if [[ " $( uname ) " == "Darwin" ]] ; then IMAGE_B64 = $( curl -sL " $IMG_URL " | base64 -b 0 ) elif [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then IMAGE_B64 = $( curl -sL " $IMG_URL " | base64 ) else IMAGE_B64 = $( curl -sL " $IMG_URL " | base64 -w0 ) fi curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ { "inline_data": { "mime_type":"' " $MIME_TYPE " '", "data": "' " $IMAGE_B64 " '" } }, {"text": "Caption this image."} ] }] }' 2 > /dev/null Note: Inline image data limits your total request size (text prompts, system instructions, and inline bytes) to 20MB. For larger requests, upload image files using the File API. Files API is also more efficient for scenarios that use the same image repeatedly. Uploading images using the File API For large files or to be able to use the same image file repeatedly, use the Files API. The following code uploads an image file and then uses the file in a call to generateContent . See the Files API guide for more information and examples. Python from google import genai client = genai . Client () my_file = client . files . upload ( file = "path/to/sample.jpg" ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ my_file , "Caption this image." ], ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); async function main () { const myfile = await ai . files . upload ({ file : "path/to/sample.jpg" , config : { mimeType : "image/jpeg" }, }); const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Caption this image." , ]), }); console . log ( response . text ); } await main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } uploadedFile , _ := client . Files . UploadFromPath ( ctx , "path/to/sample.jpg" , nil ) parts := [] * genai . Part { genai . NewPartFromText ( "Caption this image." ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST IMAGE_PATH = "path/to/sample.jpg" MIME_TYPE = $( file -b --mime-type " ${ IMAGE_PATH } " ) NUM_BYTES = $( wc -c < " ${ IMAGE_PATH } " ) DISPLAY_NAME = IMAGE tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D upload-header.tmp \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ IMAGE_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq -r ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"file_data":{"mime_type": "' " ${ MIME_TYPE } " '", "file_uri": "' " ${ file_uri } " '"}}, {"text": "Caption this image."}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Prompting with multiple images You can provide multiple images in a single prompt by including multiple image Part objects in the contents array. These can be a mix of inline data (local files or URLs) and File API references. Python from google import genai from google.genai import types client = genai . Client () # Upload the first image image1_path = "path/to/image1.jpg" uploaded_file = client . files . upload ( file = image1_path ) # Prepare the second image as inline data image2_path = "path/to/image2.png" with open ( image2_path , 'rb' ) as f : img2_bytes = f . read () # Create the prompt with text and multiple images response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ "What is different between these two images?" , uploaded_file , # Use the uploaded file reference types . Part . from_bytes ( data = img2_bytes , mime_type = 'image/png' ) ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; import * as fs from "node:fs" ; const ai = new GoogleGenAI ({}); async function main () { // Upload the first image const image1_path = "path/to/image1.jpg" ; const uploadedFile = await ai . files . upload ({ file : image1_path , config : { mimeType : "image/jpeg" }, }); // Prepare the second image as inline data const image2_path = "path/to/image2.png" ; const base64Image2File = fs . readFileSync ( image2_path , { encoding : "base64" , }); // Create the prompt with text and multiple images const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ "What is different between these two images?" , createPartFromUri ( uploadedFile . uri , uploadedFile . mimeType ), { inlineData : { mimeType : "image/png" , data : base64Image2File , }, }, ]), }); console . log ( response . text ); } await main (); Go // Upload the first image image1Path := "path/to/image1.jpg" uploadedFile , _ := client . Files . UploadFromPath ( ctx , image1Path , nil ) // Prepare the second image as inline data image2Path := "path/to/image2.jpeg" imgBytes , _ := os . ReadFile ( image2Path ) parts := [] * genai . Part { genai . NewPartFromText ( "What is different between these two images?" ), genai . NewPartFromBytes ( imgBytes , "image/jpeg" ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) REST # Upload the first image IMAGE1_PATH = "path/to/image1.jpg" MIME1_TYPE = $( file -b --mime-type " ${ IMAGE1_PATH } " ) NUM1_BYTES = $( wc -c < " ${ IMAGE1_PATH } " ) DISPLAY_NAME1 = IMAGE1 tmp_header_file1 = upload-header1.tmp curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D upload-header1.tmp \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM1_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME1_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME1 } '}}" 2 > /dev/null upload_url1 = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file1 } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file1 } " curl " ${ upload_url1 } " \ -H "Content-Length: ${ NUM1_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ IMAGE1_PATH } " 2 > /dev/null > file_info1.json file1_uri = $( jq ".file.uri" file_info1.json ) echo file1_uri = $file1_uri # Prepare the second image (inline) IMAGE2_PATH = "path/to/image2.png" MIME2_TYPE = $( file -b --mime-type " ${ IMAGE2_PATH } " ) if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMAGE2_BASE64 = $( base64 $B64FLAGS $IMAGE2_PATH ) # Now generate content using both images curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "What is different between these two images?"}, {"file_data":{"mime_type": "' " ${ MIME1_TYPE } " '", "file_uri": ' $file1_uri '}}, { "inline_data": { "mime_type":"' " ${ MIME2_TYPE } " '", "data": "' " $IMAGE2_BASE64 " '" } } ] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Object detection From Gemini 2.0 onwards, models are further trained to detect objects in an image and get their bounding box coordinates. The coordinates, relative to image dimensions, scale to [0, 1000]. You need to descale these coordinates based on your original image size. Python from google import genai from google.genai import types from PIL import Image import json client = genai . Client () prompt = "Detect the all of the prominent items in the image. The box_2d should be [ymin, xmin, ymax, xmax] normalized to 0-1000." image = Image . open ( "/path/to/image.png" ) config = types . GenerateContentConfig ( response_mime_type = "application/json" ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ image , prompt ], config = config ) width , height = image . size bounding_boxes = json . loads ( response . text ) converted_bounding_boxes = [] for bounding_box in bounding_boxes : abs_y1 = int ( bounding_box [ "box_2d" ][ 0 ] / 1000 * height ) abs_x1 = int ( bounding_box [ "box_2d" ][ 1 ] / 1000 * width ) abs_y2 = int ( bounding_box [ "box_2d" ][ 2 ] / 1000 * height ) abs_x2 = int ( bounding_box [ "box_2d" ][ 3 ] / 1000 * width ) converted_bounding_boxes . append ([ abs_x1 , abs_y1 , abs_x2 , abs_y2 ]) print ( "Image size: " , width , height ) print ( "Bounding boxes:" , converted_bounding_boxes ) Note: The model also supports generating bounding boxes based on custom instructions, such as: "Show bounding boxes of all green objects in this image". It also support custom labels like "label the items with the allergens they can contain". For more examples, check following notebooks in the Gemini Cookbook : 2D spatial understanding notebook Experimental 3D pointing notebook Segmentation Starting with Gemini 2.5, models not only detect items but also segment them and provide their contour masks. The model predicts a JSON list, where each item represents a segmentation mask. Each item has a bounding box (" box_2d ") in the format [y0, x0, y1, x1] with normalized coordinates between 0 and 1000, a label (" label ") that identifies the object, and finally the segmentation mask inside the bounding box, as base64 encoded png that is a probability map with values between 0 and 255. The mask needs to be resized to match the bounding box dimensions, then binarized at your confidence threshold (127 for the midpoint). Note: For better results, disable thinking by setting the thinking budget to 0. See code sample below for an example. Python from google import genai from google.genai import types from PIL import Image , ImageDraw import io import base64 import json import numpy as np import os client = genai . Client () def parse_json ( json_output : str ): # Parsing out the markdown fencing lines = json_output . splitlines () for i , line in enumerate ( lines ): if line == "```json" : json_output = " \n " . join ( lines [ i + 1 :]) # Remove everything before "```json" output = json_output . split ( "```" )[ 0 ] # Remove everything after the closing "```" break # Exit the loop once "```json" is found return json_output def extract_segmentation_masks ( image_path : str , output_dir : str = "segmentation_outputs" ): # Load and resize image im = Image . open ( image_path ) im . thumbnail ([ 1024 , 1024 ], Image . Resampling . LANCZOS ) prompt = """ Give the segmentation masks for the wooden and glass items. Output a JSON list of segmentation masks where each entry contains the 2D bounding box in the key "box_2d", the segmentation mask in key "mask", and the text label in the key "label". Use descriptive labels. """ config = types . GenerateContentConfig ( thinking_config = types . ThinkingConfig ( thinking_budget = 0 ) # set thinking_budget to 0 for better results in object detection ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ prompt , im ], # Pillow images can be directly passed as inputs (which will be converted by the SDK) config = config ) # Parse JSON response items = json . loads ( parse_json ( response . text )) # Create output directory os . makedirs ( output_dir , exist_ok = True ) # Process each mask for i , item in enumerate ( items ): # Get bounding box coordinates box = item [ "box_2d" ] y0 = int ( box [ 0 ] / 1000 * im . size [ 1 ]) x0 = int ( box [ 1 ] / 1000 * im . size [ 0 ]) y1 = int ( box [ 2 ] / 1000 * im . size [ 1 ]) x1 = int ( box [ 3 ] / 1000 * im . size [ 0 ]) # Skip invalid boxes if y0 > = y1 or x0 > = x1 : continue # Process mask png_str = item [ "mask" ] if not png_str . startswith ( "data:image/png;base64," ): continue # Remove prefix png_str = png_str . removeprefix ( "data:image/png;base64," ) mask_data = base64 . b64decode ( png_str ) mask = Image . open ( io . BytesIO ( mask_data )) # Resize mask to match bounding box mask = mask . resize (( x1 - x0 , y1 - y0 ), Image . Resampling . BILINEAR ) # Convert mask to numpy array for processing mask_array = np . array ( mask ) # Create overlay for this mask overlay = Image . new ( 'RGBA' , im . size , ( 0 , 0 , 0 , 0 )) overlay_draw = ImageDraw . Draw ( overlay ) # Create overlay for the mask color = ( 255 , 255 , 255 , 200 ) for y in range ( y0 , y1 ): for x in range ( x0 , x1 ): if mask_array [ y - y0 , x - x0 ] > 128 : # Threshold for mask overlay_draw . point (( x , y ), fill = color ) # Save individual mask and its overlay mask_filename = f " { item [ 'label' ] } _ { i } _mask.png" overlay_filename = f " { item [ 'label' ] } _ { i } _overlay.png" mask . save ( os . path . join ( output_dir , mask_filename )) # Create and save overlay composite = Image . alpha_composite ( im . convert ( 'RGBA' ), overlay ) composite . save ( os . path . join ( output_dir , overlay_filename )) print ( f "Saved mask and overlay for { item [ 'label' ] } to { output_dir } " ) # Example usage if __name__ == "__main__" : extract_segmentation_masks ( "path/to/image.png" ) Check the segmentation example in the cookbook guide for a more detailed example. An example segmentation output with objects and segmentation masks Supported image formats Gemini supports the following image format MIME types: PNG - image/png JPEG - image/jpeg WEBP - image/webp HEIC - image/heic HEIF - image/heif Capabilities All Gemini model versions are multimodal and can be utilized in a wide range of image processing and computer vision tasks including but not limited to image captioning, visual question and answering, image classification, object detection and segmentation. Gemini can reduce the need to use specialized ML models depending on your quality and performance requirements. Some later model versions are specifically trained improve accuracy of specialized tasks in addition to generic capabilities: Gemini 2.0 models are further trained to support enhanced object detection . Gemini 2.5 models are further trained to support enhanced segmentation in addition to object detection . Limitations and key technical information File limit Gemini 2.5 Pro/Flash, 2.0 Flash, 1.5 Pro, and 1.5 Flash support a maximum of 3,600 image files per request. Token calculation Gemini 1.5 Flash and Gemini 1.5 Pro : 258 tokens if both dimensions <= 384 pixels. Larger images are tiled (min tile 256px, max 768px, resized to 768x768), with each tile costing 258 tokens. Gemini 2.0 Flash and Gemini 2.5 Flash/Pro : 258 tokens if both dimensions <= 384 pixels. Larger images are tiled into 768x768 pixel tiles, each costing 258 tokens. Tips and best practices Verify that images are correctly rotated. Use clear, non-blurry images. When using a single image with text, place the text prompt after the image part in the contents array. What's next This guide shows you how to upload image files and generate text outputs from image inputs. To learn more, see the following resources: Files API : Learn more about uploading and managing files for use with Gemini. System instructions : System instructions let you steer the behavior of the model based on your specific needs and use cases. File prompting strategies : The Gemini API supports prompting with text, image, audio, and video data, also known as multimodal prompting. Safety guidance : Sometimes generative AI models produce unexpected outputs, such as outputs that are inaccurate, biased, or offensive. Post-processing and human evaluation are essential to limit the risk of harm from such outputs. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-08 UTC.
 
2
  Title: Image understanding  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Image understanding | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Image understanding Gemini models are built to be multimodal from the ground up, unlocking a wide range of image processing and computer vision tasks including but not limited to image captioning, classification, and visual question answering without having to train specialized ML models. Tip: In addition to their general multimodal capabilities, Gemini models (2.0 and newer) offer improved accuracy for specific use cases like object detection and segmentation , through additional training. See the Capabilities section for more details. Passing images to Gemini You can provide images as input to Gemini using two methods: Passing inline image data : Ideal for smaller files (total request size less than 20MB, including prompts). Uploading images using the File API : Recommended for larger files or for reusing images across multiple requests. Passing inline image data You can pass inline image data in the request to generateContent . You can provide image data as Base64 encoded strings or by reading local files directly (depending on the language). The following example shows how to read an image from a local file and pass it to generateContent API for processing. Python from google.genai import types with open ( 'path/to/small-sample.jpg' , 'rb' ) as f : image_bytes = f . read () response = client . models . generate_content ( model = 'gemini-2.5-flash' , contents = [ types . Part . from_bytes ( data = image_bytes , mime_type = 'image/jpeg' , ), 'Caption this image.' ] ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; import * as fs from "node:fs" ; const ai = new GoogleGenAI ({}); const base64ImageFile = fs . readFileSync ( "path/to/small-sample.jpg" , { encoding : "base64" , }); const contents = [ { inlineData : { mimeType : "image/jpeg" , data : base64ImageFile , }, }, { text : "Caption this image." }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : contents , }); console . log ( response . text ); Go bytes , _ := os . ReadFile ( "path/to/small-sample.jpg" ) parts := [] * genai . Part { genai . NewPartFromBytes ( bytes , "image/jpeg" ), genai . NewPartFromText ( "Caption this image." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) REST IMG_PATH = "/path/to/your/image1.jpg" if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ { "inline_data": { "mime_type":"image/jpeg", "data": "' " $( base64 $B64FLAGS $IMG_PATH ) " '" } }, {"text": "Caption this image."}, ] }] }' 2 > /dev/null You can also fetch an image from a URL, convert it to bytes, and pass it to generateContent as shown in the following examples. Python from google import genai from google.genai import types import requests image_path = "https://goo.gle/instrument-img" image_bytes = requests . get ( image_path ) . content image = types . Part . from_bytes ( data = image_bytes , mime_type = "image/jpeg" ) client = genai . Client () response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ "What is this image?" , image ], ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; async function main () { const ai = new GoogleGenAI ({}); const imageUrl = "https://goo.gle/instrument-img" ; const response = await fetch ( imageUrl ); const imageArrayBuffer = await response . arrayBuffer (); const base64ImageData = Buffer . from ( imageArrayBuffer ). toString ( 'base64' ); const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : [ { inlineData : { mimeType : 'image/jpeg' , data : base64ImageData , }, }, { text : "Caption this image." } ], }); console . log ( result . text ); } main (); Go package main import ( "context" "fmt" "os" "io" "net/http" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } // Download the image. imageResp , _ := http . Get ( "https://goo.gle/instrument-img" ) imageBytes , _ := io . ReadAll ( imageResp . Body ) parts := [] * genai . Part { genai . NewPartFromBytes ( imageBytes , "image/jpeg" ), genai . NewPartFromText ( "Caption this image." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST IMG_URL = "https://goo.gle/instrument-img" MIME_TYPE = $( curl -sIL " $IMG_URL " | grep -i '^content-type:' | awk -F ': ' '{print $2}' | sed 's/\r$//' | head -n 1 ) if [[ -z " $MIME_TYPE " || ! " $MIME_TYPE " == image/* ]] ; then MIME_TYPE = "image/jpeg" fi # Check for macOS if [[ " $( uname ) " == "Darwin" ]] ; then IMAGE_B64 = $( curl -sL " $IMG_URL " | base64 -b 0 ) elif [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then IMAGE_B64 = $( curl -sL " $IMG_URL " | base64 ) else IMAGE_B64 = $( curl -sL " $IMG_URL " | base64 -w0 ) fi curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ { "inline_data": { "mime_type":"' " $MIME_TYPE " '", "data": "' " $IMAGE_B64 " '" } }, {"text": "Caption this image."} ] }] }' 2 > /dev/null Note: Inline image data limits your total request size (text prompts, system instructions, and inline bytes) to 20MB. For larger requests, upload image files using the File API. Files API is also more efficient for scenarios that use the same image repeatedly. Uploading images using the File API For large files or to be able to use the same image file repeatedly, use the Files API. The following code uploads an image file and then uses the file in a call to generateContent . See the Files API guide for more information and examples. Python from google import genai client = genai . Client () my_file = client . files . upload ( file = "path/to/sample.jpg" ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ my_file , "Caption this image." ], ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); async function main () { const myfile = await ai . files . upload ({ file : "path/to/sample.jpg" , config : { mimeType : "image/jpeg" }, }); const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Caption this image." , ]), }); console . log ( response . text ); } await main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } uploadedFile , _ := client . Files . UploadFromPath ( ctx , "path/to/sample.jpg" , nil ) parts := [] * genai . Part { genai . NewPartFromText ( "Caption this image." ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST IMAGE_PATH = "path/to/sample.jpg" MIME_TYPE = $( file -b --mime-type " ${ IMAGE_PATH } " ) NUM_BYTES = $( wc -c < " ${ IMAGE_PATH } " ) DISPLAY_NAME = IMAGE tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D upload-header.tmp \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ IMAGE_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq -r ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"file_data":{"mime_type": "' " ${ MIME_TYPE } " '", "file_uri": "' " ${ file_uri } " '"}}, {"text": "Caption this image."}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Prompting with multiple images You can provide multiple images in a single prompt by including multiple image Part objects in the contents array. These can be a mix of inline data (local files or URLs) and File API references. Python from google import genai from google.genai import types client = genai . Client () # Upload the first image image1_path = "path/to/image1.jpg" uploaded_file = client . files . upload ( file = image1_path ) # Prepare the second image as inline data image2_path = "path/to/image2.png" with open ( image2_path , 'rb' ) as f : img2_bytes = f . read () # Create the prompt with text and multiple images response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ "What is different between these two images?" , uploaded_file , # Use the uploaded file reference types . Part . from_bytes ( data = img2_bytes , mime_type = 'image/png' ) ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; import * as fs from "node:fs" ; const ai = new GoogleGenAI ({}); async function main () { // Upload the first image const image1_path = "path/to/image1.jpg" ; const uploadedFile = await ai . files . upload ({ file : image1_path , config : { mimeType : "image/jpeg" }, }); // Prepare the second image as inline data const image2_path = "path/to/image2.png" ; const base64Image2File = fs . readFileSync ( image2_path , { encoding : "base64" , }); // Create the prompt with text and multiple images const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ "What is different between these two images?" , createPartFromUri ( uploadedFile . uri , uploadedFile . mimeType ), { inlineData : { mimeType : "image/png" , data : base64Image2File , }, }, ]), }); console . log ( response . text ); } await main (); Go // Upload the first image image1Path := "path/to/image1.jpg" uploadedFile , _ := client . Files . UploadFromPath ( ctx , image1Path , nil ) // Prepare the second image as inline data image2Path := "path/to/image2.jpeg" imgBytes , _ := os . ReadFile ( image2Path ) parts := [] * genai . Part { genai . NewPartFromText ( "What is different between these two images?" ), genai . NewPartFromBytes ( imgBytes , "image/jpeg" ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) REST # Upload the first image IMAGE1_PATH = "path/to/image1.jpg" MIME1_TYPE = $( file -b --mime-type " ${ IMAGE1_PATH } " ) NUM1_BYTES = $( wc -c < " ${ IMAGE1_PATH } " ) DISPLAY_NAME1 = IMAGE1 tmp_header_file1 = upload-header1.tmp curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D upload-header1.tmp \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM1_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME1_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME1 } '}}" 2 > /dev/null upload_url1 = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file1 } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file1 } " curl " ${ upload_url1 } " \ -H "Content-Length: ${ NUM1_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ IMAGE1_PATH } " 2 > /dev/null > file_info1.json file1_uri = $( jq ".file.uri" file_info1.json ) echo file1_uri = $file1_uri # Prepare the second image (inline) IMAGE2_PATH = "path/to/image2.png" MIME2_TYPE = $( file -b --mime-type " ${ IMAGE2_PATH } " ) if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMAGE2_BASE64 = $( base64 $B64FLAGS $IMAGE2_PATH ) # Now generate content using both images curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "What is different between these two images?"}, {"file_data":{"mime_type": "' " ${ MIME1_TYPE } " '", "file_uri": ' $file1_uri '}}, { "inline_data": { "mime_type":"' " ${ MIME2_TYPE } " '", "data": "' " $IMAGE2_BASE64 " '" } } ] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Object detection From Gemini 2.0 onwards, models are further trained to detect objects in an image and get their bounding box coordinates. The coordinates, relative to image dimensions, scale to [0, 1000]. You need to descale these coordinates based on your original image size. Python from google import genai from google.genai import types from PIL import Image import json client = genai . Client () prompt = "Detect the all of the prominent items in the image. The box_2d should be [ymin, xmin, ymax, xmax] normalized to 0-1000." image = Image . open ( "/path/to/image.png" ) config = types . GenerateContentConfig ( response_mime_type = "application/json" ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ image , prompt ], config = config ) width , height = image . size bounding_boxes = json . loads ( response . text ) converted_bounding_boxes = [] for bounding_box in bounding_boxes : abs_y1 = int ( bounding_box [ "box_2d" ][ 0 ] / 1000 * height ) abs_x1 = int ( bounding_box [ "box_2d" ][ 1 ] / 1000 * width ) abs_y2 = int ( bounding_box [ "box_2d" ][ 2 ] / 1000 * height ) abs_x2 = int ( bounding_box [ "box_2d" ][ 3 ] / 1000 * width ) converted_bounding_boxes . append ([ abs_x1 , abs_y1 , abs_x2 , abs_y2 ]) print ( "Image size: " , width , height ) print ( "Bounding boxes:" , converted_bounding_boxes ) Note: The model also supports generating bounding boxes based on custom instructions, such as: "Show bounding boxes of all green objects in this image". It also support custom labels like "label the items with the allergens they can contain". For more examples, check following notebooks in the Gemini Cookbook : 2D spatial understanding notebook Experimental 3D pointing notebook Segmentation Starting with Gemini 2.5, models not only detect items but also segment them and provide their contour masks. The model predicts a JSON list, where each item represents a segmentation mask. Each item has a bounding box (" box_2d ") in the format [y0, x0, y1, x1] with normalized coordinates between 0 and 1000, a label (" label ") that identifies the object, and finally the segmentation mask inside the bounding box, as base64 encoded png that is a probability map with values between 0 and 255. The mask needs to be resized to match the bounding box dimensions, then binarized at your confidence threshold (127 for the midpoint). Note: For better results, disable thinking by setting the thinking budget to 0. See code sample below for an example. Python from google import genai from google.genai import types from PIL import Image , ImageDraw import io import base64 import json import numpy as np import os client = genai . Client () def parse_json ( json_output : str ): # Parsing out the markdown fencing lines = json_output . splitlines () for i , line in enumerate ( lines ): if line == "```json" : json_output = " \n " . join ( lines [ i + 1 :]) # Remove everything before "```json" output = json_output . split ( "```" )[ 0 ] # Remove everything after the closing "```" break # Exit the loop once "```json" is found return json_output def extract_segmentation_masks ( image_path : str , output_dir : str = "segmentation_outputs" ): # Load and resize image im = Image . open ( image_path ) im . thumbnail ([ 1024 , 1024 ], Image . Resampling . LANCZOS ) prompt = """ Give the segmentation masks for the wooden and glass items. Output a JSON list of segmentation masks where each entry contains the 2D bounding box in the key "box_2d", the segmentation mask in key "mask", and the text label in the key "label". Use descriptive labels. """ config = types . GenerateContentConfig ( thinking_config = types . ThinkingConfig ( thinking_budget = 0 ) # set thinking_budget to 0 for better results in object detection ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ prompt , im ], # Pillow images can be directly passed as inputs (which will be converted by the SDK) config = config ) # Parse JSON response items = json . loads ( parse_json ( response . text )) # Create output directory os . makedirs ( output_dir , exist_ok = True ) # Process each mask for i , item in enumerate ( items ): # Get bounding box coordinates box = item [ "box_2d" ] y0 = int ( box [ 0 ] / 1000 * im . size [ 1 ]) x0 = int ( box [ 1 ] / 1000 * im . size [ 0 ]) y1 = int ( box [ 2 ] / 1000 * im . size [ 1 ]) x1 = int ( box [ 3 ] / 1000 * im . size [ 0 ]) # Skip invalid boxes if y0 > = y1 or x0 > = x1 : continue # Process mask png_str = item [ "mask" ] if not png_str . startswith ( "data:image/png;base64," ): continue # Remove prefix png_str = png_str . removeprefix ( "data:image/png;base64," ) mask_data = base64 . b64decode ( png_str ) mask = Image . open ( io . BytesIO ( mask_data )) # Resize mask to match bounding box mask = mask . resize (( x1 - x0 , y1 - y0 ), Image . Resampling . BILINEAR ) # Convert mask to numpy array for processing mask_array = np . array ( mask ) # Create overlay for this mask overlay = Image . new ( 'RGBA' , im . size , ( 0 , 0 , 0 , 0 )) overlay_draw = ImageDraw . Draw ( overlay ) # Create overlay for the mask color = ( 255 , 255 , 255 , 200 ) for y in range ( y0 , y1 ): for x in range ( x0 , x1 ): if mask_array [ y - y0 , x - x0 ] > 128 : # Threshold for mask overlay_draw . point (( x , y ), fill = color ) # Save individual mask and its overlay mask_filename = f " { item [ 'label' ] } _ { i } _mask.png" overlay_filename = f " { item [ 'label' ] } _ { i } _overlay.png" mask . save ( os . path . join ( output_dir , mask_filename )) # Create and save overlay composite = Image . alpha_composite ( im . convert ( 'RGBA' ), overlay ) composite . save ( os . path . join ( output_dir , overlay_filename )) print ( f "Saved mask and overlay for { item [ 'label' ] } to { output_dir } " ) # Example usage if __name__ == "__main__" : extract_segmentation_masks ( "path/to/image.png" ) Check the segmentation example in the cookbook guide for a more detailed example. An example segmentation output with objects and segmentation masks Supported image formats Gemini supports the following image format MIME types: PNG - image/png JPEG - image/jpeg WEBP - image/webp HEIC - image/heic HEIF - image/heif Capabilities All Gemini model versions are multimodal and can be utilized in a wide range of image processing and computer vision tasks including but not limited to image captioning, visual question and answering, image classification, object detection and segmentation. Gemini can reduce the need to use specialized ML models depending on your quality and performance requirements. Some later model versions are specifically trained improve accuracy of specialized tasks in addition to generic capabilities: Gemini 2.0 models are further trained to support enhanced object detection . Gemini 2.5 models are further trained to support enhanced segmentation in addition to object detection . Limitations and key technical information File limit Gemini 2.5 Pro/Flash, 2.0 Flash, 1.5 Pro, and 1.5 Flash support a maximum of 3,600 image files per request. Token calculation Gemini 1.5 Flash and Gemini 1.5 Pro : 258 tokens if both dimensions <= 384 pixels. Larger images are tiled (min tile 256px, max 768px, resized to 768x768), with each tile costing 258 tokens. Gemini 2.0 Flash and Gemini 2.5 Flash/Pro : 258 tokens if both dimensions <= 384 pixels. Larger images are tiled into 768x768 pixel tiles, each costing 258 tokens. A rough formula for calculating the number of tiles is as follows: Calculate the crop unit size which is roughly: floor(min(width, height) / 1.5). Divide each dimension by the crop unit size and multiply together to get the number of tiles. For example, for an image of dimensions 960x540 would have a crop unit size of 360. Divide each dimension by 360 and the number of tile is 3 * 2 = 6. Tips and best practices Verify that images are correctly rotated. Use clear, non-blurry images. When using a single image with text, place the text prompt after the image part in the contents array. What's next This guide shows you how to upload image files and generate text outputs from image inputs. To learn more, see the following resources: Files API : Learn more about uploading and managing files for use with Gemini. System instructions : System instructions let you steer the behavior of the model based on your specific needs and use cases. File prompting strategies : The Gemini API supports prompting with text, image, audio, and video data, also known as multimodal prompting. Safety guidance : Sometimes generative AI models produce unexpected outputs, such as outputs that are inaccurate, biased, or offensive. Post-processing and human evaluation are essential to limit the risk of harm from such outputs. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-09-05 UTC.
text_content/docs_image-understanding_f3fc24eb.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/image-understanding#inline-image
2
  Title: Image understanding  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Image understanding | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Image understanding Gemini models are built to be multimodal from the ground up, unlocking a wide range of image processing and computer vision tasks including but not limited to image captioning, classification, and visual question answering without having to train specialized ML models. Tip: In addition to their general multimodal capabilities, Gemini models (2.0 and newer) offer improved accuracy for specific use cases like object detection and segmentation , through additional training. See the Capabilities section for more details. Passing images to Gemini You can provide images as input to Gemini using two methods: Passing inline image data : Ideal for smaller files (total request size less than 20MB, including prompts). Uploading images using the File API : Recommended for larger files or for reusing images across multiple requests. Passing inline image data You can pass inline image data in the request to generateContent . You can provide image data as Base64 encoded strings or by reading local files directly (depending on the language). The following example shows how to read an image from a local file and pass it to generateContent API for processing. Python from google.genai import types with open ( 'path/to/small-sample.jpg' , 'rb' ) as f : image_bytes = f . read () response = client . models . generate_content ( model = 'gemini-2.5-flash' , contents = [ types . Part . from_bytes ( data = image_bytes , mime_type = 'image/jpeg' , ), 'Caption this image.' ] ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; import * as fs from "node:fs" ; const ai = new GoogleGenAI ({}); const base64ImageFile = fs . readFileSync ( "path/to/small-sample.jpg" , { encoding : "base64" , }); const contents = [ { inlineData : { mimeType : "image/jpeg" , data : base64ImageFile , }, }, { text : "Caption this image." }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : contents , }); console . log ( response . text ); Go bytes , _ := os . ReadFile ( "path/to/small-sample.jpg" ) parts := [] * genai . Part { genai . NewPartFromBytes ( bytes , "image/jpeg" ), genai . NewPartFromText ( "Caption this image." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) REST IMG_PATH = "/path/to/your/image1.jpg" if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ { "inline_data": { "mime_type":"image/jpeg", "data": "' " $( base64 $B64FLAGS $IMG_PATH ) " '" } }, {"text": "Caption this image."}, ] }] }' 2 > /dev/null You can also fetch an image from a URL, convert it to bytes, and pass it to generateContent as shown in the following examples. Python from google import genai from google.genai import types import requests image_path = "https://goo.gle/instrument-img" image_bytes = requests . get ( image_path ) . content image = types . Part . from_bytes ( data = image_bytes , mime_type = "image/jpeg" ) client = genai . Client () response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ "What is this image?" , image ], ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; async function main () { const ai = new GoogleGenAI ({}); const imageUrl = "https://goo.gle/instrument-img" ; const response = await fetch ( imageUrl ); const imageArrayBuffer = await response . arrayBuffer (); const base64ImageData = Buffer . from ( imageArrayBuffer ). toString ( 'base64' ); const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : [ { inlineData : { mimeType : 'image/jpeg' , data : base64ImageData , }, }, { text : "Caption this image." } ], }); console . log ( result . text ); } main (); Go package main import ( "context" "fmt" "os" "io" "net/http" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } // Download the image. imageResp , _ := http . Get ( "https://goo.gle/instrument-img" ) imageBytes , _ := io . ReadAll ( imageResp . Body ) parts := [] * genai . Part { genai . NewPartFromBytes ( imageBytes , "image/jpeg" ), genai . NewPartFromText ( "Caption this image." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST IMG_URL = "https://goo.gle/instrument-img" MIME_TYPE = $( curl -sIL " $IMG_URL " | grep -i '^content-type:' | awk -F ': ' '{print $2}' | sed 's/\r$//' | head -n 1 ) if [[ -z " $MIME_TYPE " || ! " $MIME_TYPE " == image/* ]] ; then MIME_TYPE = "image/jpeg" fi # Check for macOS if [[ " $( uname ) " == "Darwin" ]] ; then IMAGE_B64 = $( curl -sL " $IMG_URL " | base64 -b 0 ) elif [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then IMAGE_B64 = $( curl -sL " $IMG_URL " | base64 ) else IMAGE_B64 = $( curl -sL " $IMG_URL " | base64 -w0 ) fi curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ { "inline_data": { "mime_type":"' " $MIME_TYPE " '", "data": "' " $IMAGE_B64 " '" } }, {"text": "Caption this image."} ] }] }' 2 > /dev/null Note: Inline image data limits your total request size (text prompts, system instructions, and inline bytes) to 20MB. For larger requests, upload image files using the File API. Files API is also more efficient for scenarios that use the same image repeatedly. Uploading images using the File API For large files or to be able to use the same image file repeatedly, use the Files API. The following code uploads an image file and then uses the file in a call to generateContent . See the Files API guide for more information and examples. Python from google import genai client = genai . Client () my_file = client . files . upload ( file = "path/to/sample.jpg" ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ my_file , "Caption this image." ], ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); async function main () { const myfile = await ai . files . upload ({ file : "path/to/sample.jpg" , config : { mimeType : "image/jpeg" }, }); const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Caption this image." , ]), }); console . log ( response . text ); } await main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } uploadedFile , _ := client . Files . UploadFromPath ( ctx , "path/to/sample.jpg" , nil ) parts := [] * genai . Part { genai . NewPartFromText ( "Caption this image." ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST IMAGE_PATH = "path/to/sample.jpg" MIME_TYPE = $( file -b --mime-type " ${ IMAGE_PATH } " ) NUM_BYTES = $( wc -c < " ${ IMAGE_PATH } " ) DISPLAY_NAME = IMAGE tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D upload-header.tmp \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ IMAGE_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq -r ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"file_data":{"mime_type": "' " ${ MIME_TYPE } " '", "file_uri": "' " ${ file_uri } " '"}}, {"text": "Caption this image."}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Prompting with multiple images You can provide multiple images in a single prompt by including multiple image Part objects in the contents array. These can be a mix of inline data (local files or URLs) and File API references. Python from google import genai from google.genai import types client = genai . Client () # Upload the first image image1_path = "path/to/image1.jpg" uploaded_file = client . files . upload ( file = image1_path ) # Prepare the second image as inline data image2_path = "path/to/image2.png" with open ( image2_path , 'rb' ) as f : img2_bytes = f . read () # Create the prompt with text and multiple images response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ "What is different between these two images?" , uploaded_file , # Use the uploaded file reference types . Part . from_bytes ( data = img2_bytes , mime_type = 'image/png' ) ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; import * as fs from "node:fs" ; const ai = new GoogleGenAI ({}); async function main () { // Upload the first image const image1_path = "path/to/image1.jpg" ; const uploadedFile = await ai . files . upload ({ file : image1_path , config : { mimeType : "image/jpeg" }, }); // Prepare the second image as inline data const image2_path = "path/to/image2.png" ; const base64Image2File = fs . readFileSync ( image2_path , { encoding : "base64" , }); // Create the prompt with text and multiple images const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ "What is different between these two images?" , createPartFromUri ( uploadedFile . uri , uploadedFile . mimeType ), { inlineData : { mimeType : "image/png" , data : base64Image2File , }, }, ]), }); console . log ( response . text ); } await main (); Go // Upload the first image image1Path := "path/to/image1.jpg" uploadedFile , _ := client . Files . UploadFromPath ( ctx , image1Path , nil ) // Prepare the second image as inline data image2Path := "path/to/image2.jpeg" imgBytes , _ := os . ReadFile ( image2Path ) parts := [] * genai . Part { genai . NewPartFromText ( "What is different between these two images?" ), genai . NewPartFromBytes ( imgBytes , "image/jpeg" ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) REST # Upload the first image IMAGE1_PATH = "path/to/image1.jpg" MIME1_TYPE = $( file -b --mime-type " ${ IMAGE1_PATH } " ) NUM1_BYTES = $( wc -c < " ${ IMAGE1_PATH } " ) DISPLAY_NAME1 = IMAGE1 tmp_header_file1 = upload-header1.tmp curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D upload-header1.tmp \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM1_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME1_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME1 } '}}" 2 > /dev/null upload_url1 = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file1 } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file1 } " curl " ${ upload_url1 } " \ -H "Content-Length: ${ NUM1_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ IMAGE1_PATH } " 2 > /dev/null > file_info1.json file1_uri = $( jq ".file.uri" file_info1.json ) echo file1_uri = $file1_uri # Prepare the second image (inline) IMAGE2_PATH = "path/to/image2.png" MIME2_TYPE = $( file -b --mime-type " ${ IMAGE2_PATH } " ) if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMAGE2_BASE64 = $( base64 $B64FLAGS $IMAGE2_PATH ) # Now generate content using both images curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "What is different between these two images?"}, {"file_data":{"mime_type": "' " ${ MIME1_TYPE } " '", "file_uri": ' $file1_uri '}}, { "inline_data": { "mime_type":"' " ${ MIME2_TYPE } " '", "data": "' " $IMAGE2_BASE64 " '" } } ] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Object detection From Gemini 2.0 onwards, models are further trained to detect objects in an image and get their bounding box coordinates. The coordinates, relative to image dimensions, scale to [0, 1000]. You need to descale these coordinates based on your original image size. Python from google import genai from google.genai import types from PIL import Image import json client = genai . Client () prompt = "Detect the all of the prominent items in the image. The box_2d should be [ymin, xmin, ymax, xmax] normalized to 0-1000." image = Image . open ( "/path/to/image.png" ) config = types . GenerateContentConfig ( response_mime_type = "application/json" ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ image , prompt ], config = config ) width , height = image . size bounding_boxes = json . loads ( response . text ) converted_bounding_boxes = [] for bounding_box in bounding_boxes : abs_y1 = int ( bounding_box [ "box_2d" ][ 0 ] / 1000 * height ) abs_x1 = int ( bounding_box [ "box_2d" ][ 1 ] / 1000 * width ) abs_y2 = int ( bounding_box [ "box_2d" ][ 2 ] / 1000 * height ) abs_x2 = int ( bounding_box [ "box_2d" ][ 3 ] / 1000 * width ) converted_bounding_boxes . append ([ abs_x1 , abs_y1 , abs_x2 , abs_y2 ]) print ( "Image size: " , width , height ) print ( "Bounding boxes:" , converted_bounding_boxes ) Note: The model also supports generating bounding boxes based on custom instructions, such as: "Show bounding boxes of all green objects in this image". It also support custom labels like "label the items with the allergens they can contain". For more examples, check following notebooks in the Gemini Cookbook : 2D spatial understanding notebook Experimental 3D pointing notebook Segmentation Starting with Gemini 2.5, models not only detect items but also segment them and provide their contour masks. The model predicts a JSON list, where each item represents a segmentation mask. Each item has a bounding box (" box_2d ") in the format [y0, x0, y1, x1] with normalized coordinates between 0 and 1000, a label (" label ") that identifies the object, and finally the segmentation mask inside the bounding box, as base64 encoded png that is a probability map with values between 0 and 255. The mask needs to be resized to match the bounding box dimensions, then binarized at your confidence threshold (127 for the midpoint). Note: For better results, disable thinking by setting the thinking budget to 0. See code sample below for an example. Python from google import genai from google.genai import types from PIL import Image , ImageDraw import io import base64 import json import numpy as np import os client = genai . Client () def parse_json ( json_output : str ): # Parsing out the markdown fencing lines = json_output . splitlines () for i , line in enumerate ( lines ): if line == "```json" : json_output = " \n " . join ( lines [ i + 1 :]) # Remove everything before "```json" output = json_output . split ( "```" )[ 0 ] # Remove everything after the closing "```" break # Exit the loop once "```json" is found return json_output def extract_segmentation_masks ( image_path : str , output_dir : str = "segmentation_outputs" ): # Load and resize image im = Image . open ( image_path ) im . thumbnail ([ 1024 , 1024 ], Image . Resampling . LANCZOS ) prompt = """ Give the segmentation masks for the wooden and glass items. Output a JSON list of segmentation masks where each entry contains the 2D bounding box in the key "box_2d", the segmentation mask in key "mask", and the text label in the key "label". Use descriptive labels. """ config = types . GenerateContentConfig ( thinking_config = types . ThinkingConfig ( thinking_budget = 0 ) # set thinking_budget to 0 for better results in object detection ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ prompt , im ], # Pillow images can be directly passed as inputs (which will be converted by the SDK) config = config ) # Parse JSON response items = json . loads ( parse_json ( response . text )) # Create output directory os . makedirs ( output_dir , exist_ok = True ) # Process each mask for i , item in enumerate ( items ): # Get bounding box coordinates box = item [ "box_2d" ] y0 = int ( box [ 0 ] / 1000 * im . size [ 1 ]) x0 = int ( box [ 1 ] / 1000 * im . size [ 0 ]) y1 = int ( box [ 2 ] / 1000 * im . size [ 1 ]) x1 = int ( box [ 3 ] / 1000 * im . size [ 0 ]) # Skip invalid boxes if y0 > = y1 or x0 > = x1 : continue # Process mask png_str = item [ "mask" ] if not png_str . startswith ( "data:image/png;base64," ): continue # Remove prefix png_str = png_str . removeprefix ( "data:image/png;base64," ) mask_data = base64 . b64decode ( png_str ) mask = Image . open ( io . BytesIO ( mask_data )) # Resize mask to match bounding box mask = mask . resize (( x1 - x0 , y1 - y0 ), Image . Resampling . BILINEAR ) # Convert mask to numpy array for processing mask_array = np . array ( mask ) # Create overlay for this mask overlay = Image . new ( 'RGBA' , im . size , ( 0 , 0 , 0 , 0 )) overlay_draw = ImageDraw . Draw ( overlay ) # Create overlay for the mask color = ( 255 , 255 , 255 , 200 ) for y in range ( y0 , y1 ): for x in range ( x0 , x1 ): if mask_array [ y - y0 , x - x0 ] > 128 : # Threshold for mask overlay_draw . point (( x , y ), fill = color ) # Save individual mask and its overlay mask_filename = f " { item [ 'label' ] } _ { i } _mask.png" overlay_filename = f " { item [ 'label' ] } _ { i } _overlay.png" mask . save ( os . path . join ( output_dir , mask_filename )) # Create and save overlay composite = Image . alpha_composite ( im . convert ( 'RGBA' ), overlay ) composite . save ( os . path . join ( output_dir , overlay_filename )) print ( f "Saved mask and overlay for { item [ 'label' ] } to { output_dir } " ) # Example usage if __name__ == "__main__" : extract_segmentation_masks ( "path/to/image.png" ) Check the segmentation example in the cookbook guide for a more detailed example. An example segmentation output with objects and segmentation masks Supported image formats Gemini supports the following image format MIME types: PNG - image/png JPEG - image/jpeg WEBP - image/webp HEIC - image/heic HEIF - image/heif Capabilities All Gemini model versions are multimodal and can be utilized in a wide range of image processing and computer vision tasks including but not limited to image captioning, visual question and answering, image classification, object detection and segmentation. Gemini can reduce the need to use specialized ML models depending on your quality and performance requirements. Some later model versions are specifically trained improve accuracy of specialized tasks in addition to generic capabilities: Gemini 2.0 models are further trained to support enhanced object detection . Gemini 2.5 models are further trained to support enhanced segmentation in addition to object detection . Limitations and key technical information File limit Gemini 2.5 Pro/Flash, 2.0 Flash, 1.5 Pro, and 1.5 Flash support a maximum of 3,600 image files per request. Token calculation Gemini 1.5 Flash and Gemini 1.5 Pro : 258 tokens if both dimensions <= 384 pixels. Larger images are tiled (min tile 256px, max 768px, resized to 768x768), with each tile costing 258 tokens. Gemini 2.0 Flash and Gemini 2.5 Flash/Pro : 258 tokens if both dimensions <= 384 pixels. Larger images are tiled into 768x768 pixel tiles, each costing 258 tokens. Tips and best practices Verify that images are correctly rotated. Use clear, non-blurry images. When using a single image with text, place the text prompt after the image part in the contents array. What's next This guide shows you how to upload image files and generate text outputs from image inputs. To learn more, see the following resources: Files API : Learn more about uploading and managing files for use with Gemini. System instructions : System instructions let you steer the behavior of the model based on your specific needs and use cases. File prompting strategies : The Gemini API supports prompting with text, image, audio, and video data, also known as multimodal prompting. Safety guidance : Sometimes generative AI models produce unexpected outputs, such as outputs that are inaccurate, biased, or offensive. Post-processing and human evaluation are essential to limit the risk of harm from such outputs. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-08 UTC.
 
2
  Title: Image understanding  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Image understanding | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Image understanding Gemini models are built to be multimodal from the ground up, unlocking a wide range of image processing and computer vision tasks including but not limited to image captioning, classification, and visual question answering without having to train specialized ML models. Tip: In addition to their general multimodal capabilities, Gemini models (2.0 and newer) offer improved accuracy for specific use cases like object detection and segmentation , through additional training. See the Capabilities section for more details. Passing images to Gemini You can provide images as input to Gemini using two methods: Passing inline image data : Ideal for smaller files (total request size less than 20MB, including prompts). Uploading images using the File API : Recommended for larger files or for reusing images across multiple requests. Passing inline image data You can pass inline image data in the request to generateContent . You can provide image data as Base64 encoded strings or by reading local files directly (depending on the language). The following example shows how to read an image from a local file and pass it to generateContent API for processing. Python from google.genai import types with open ( 'path/to/small-sample.jpg' , 'rb' ) as f : image_bytes = f . read () response = client . models . generate_content ( model = 'gemini-2.5-flash' , contents = [ types . Part . from_bytes ( data = image_bytes , mime_type = 'image/jpeg' , ), 'Caption this image.' ] ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; import * as fs from "node:fs" ; const ai = new GoogleGenAI ({}); const base64ImageFile = fs . readFileSync ( "path/to/small-sample.jpg" , { encoding : "base64" , }); const contents = [ { inlineData : { mimeType : "image/jpeg" , data : base64ImageFile , }, }, { text : "Caption this image." }, ]; const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : contents , }); console . log ( response . text ); Go bytes , _ := os . ReadFile ( "path/to/small-sample.jpg" ) parts := [] * genai . Part { genai . NewPartFromBytes ( bytes , "image/jpeg" ), genai . NewPartFromText ( "Caption this image." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) REST IMG_PATH = "/path/to/your/image1.jpg" if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ { "inline_data": { "mime_type":"image/jpeg", "data": "' " $( base64 $B64FLAGS $IMG_PATH ) " '" } }, {"text": "Caption this image."}, ] }] }' 2 > /dev/null You can also fetch an image from a URL, convert it to bytes, and pass it to generateContent as shown in the following examples. Python from google import genai from google.genai import types import requests image_path = "https://goo.gle/instrument-img" image_bytes = requests . get ( image_path ) . content image = types . Part . from_bytes ( data = image_bytes , mime_type = "image/jpeg" ) client = genai . Client () response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ "What is this image?" , image ], ) print ( response . text ) JavaScript import { GoogleGenAI } from "@google/genai" ; async function main () { const ai = new GoogleGenAI ({}); const imageUrl = "https://goo.gle/instrument-img" ; const response = await fetch ( imageUrl ); const imageArrayBuffer = await response . arrayBuffer (); const base64ImageData = Buffer . from ( imageArrayBuffer ). toString ( 'base64' ); const result = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : [ { inlineData : { mimeType : 'image/jpeg' , data : base64ImageData , }, }, { text : "Caption this image." } ], }); console . log ( result . text ); } main (); Go package main import ( "context" "fmt" "os" "io" "net/http" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } // Download the image. imageResp , _ := http . Get ( "https://goo.gle/instrument-img" ) imageBytes , _ := io . ReadAll ( imageResp . Body ) parts := [] * genai . Part { genai . NewPartFromBytes ( imageBytes , "image/jpeg" ), genai . NewPartFromText ( "Caption this image." ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST IMG_URL = "https://goo.gle/instrument-img" MIME_TYPE = $( curl -sIL " $IMG_URL " | grep -i '^content-type:' | awk -F ': ' '{print $2}' | sed 's/\r$//' | head -n 1 ) if [[ -z " $MIME_TYPE " || ! " $MIME_TYPE " == image/* ]] ; then MIME_TYPE = "image/jpeg" fi # Check for macOS if [[ " $( uname ) " == "Darwin" ]] ; then IMAGE_B64 = $( curl -sL " $IMG_URL " | base64 -b 0 ) elif [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then IMAGE_B64 = $( curl -sL " $IMG_URL " | base64 ) else IMAGE_B64 = $( curl -sL " $IMG_URL " | base64 -w0 ) fi curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ { "inline_data": { "mime_type":"' " $MIME_TYPE " '", "data": "' " $IMAGE_B64 " '" } }, {"text": "Caption this image."} ] }] }' 2 > /dev/null Note: Inline image data limits your total request size (text prompts, system instructions, and inline bytes) to 20MB. For larger requests, upload image files using the File API. Files API is also more efficient for scenarios that use the same image repeatedly. Uploading images using the File API For large files or to be able to use the same image file repeatedly, use the Files API. The following code uploads an image file and then uses the file in a call to generateContent . See the Files API guide for more information and examples. Python from google import genai client = genai . Client () my_file = client . files . upload ( file = "path/to/sample.jpg" ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ my_file , "Caption this image." ], ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; const ai = new GoogleGenAI ({}); async function main () { const myfile = await ai . files . upload ({ file : "path/to/sample.jpg" , config : { mimeType : "image/jpeg" }, }); const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ createPartFromUri ( myfile . uri , myfile . mimeType ), "Caption this image." , ]), }); console . log ( response . text ); } await main (); Go package main import ( "context" "fmt" "os" "google.golang.org/genai" ) func main () { ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } uploadedFile , _ := client . Files . UploadFromPath ( ctx , "path/to/sample.jpg" , nil ) parts := [] * genai . Part { genai . NewPartFromText ( "Caption this image." ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) } REST IMAGE_PATH = "path/to/sample.jpg" MIME_TYPE = $( file -b --mime-type " ${ IMAGE_PATH } " ) NUM_BYTES = $( wc -c < " ${ IMAGE_PATH } " ) DISPLAY_NAME = IMAGE tmp_header_file = upload-header.tmp # Initial resumable request defining metadata. # The upload url is in the response headers dump them to a file. curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D upload-header.tmp \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME } '}}" 2 > /dev/null upload_url = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file } " # Upload the actual bytes. curl " ${ upload_url } " \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H "Content-Length: ${ NUM_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ IMAGE_PATH } " 2 > /dev/null > file_info.json file_uri = $( jq -r ".file.uri" file_info.json ) echo file_uri = $file_uri # Now generate content using that file curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"file_data":{"mime_type": "' " ${ MIME_TYPE } " '", "file_uri": "' " ${ file_uri } " '"}}, {"text": "Caption this image."}] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Prompting with multiple images You can provide multiple images in a single prompt by including multiple image Part objects in the contents array. These can be a mix of inline data (local files or URLs) and File API references. Python from google import genai from google.genai import types client = genai . Client () # Upload the first image image1_path = "path/to/image1.jpg" uploaded_file = client . files . upload ( file = image1_path ) # Prepare the second image as inline data image2_path = "path/to/image2.png" with open ( image2_path , 'rb' ) as f : img2_bytes = f . read () # Create the prompt with text and multiple images response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ "What is different between these two images?" , uploaded_file , # Use the uploaded file reference types . Part . from_bytes ( data = img2_bytes , mime_type = 'image/png' ) ] ) print ( response . text ) JavaScript import { GoogleGenAI , createUserContent , createPartFromUri , } from "@google/genai" ; import * as fs from "node:fs" ; const ai = new GoogleGenAI ({}); async function main () { // Upload the first image const image1_path = "path/to/image1.jpg" ; const uploadedFile = await ai . files . upload ({ file : image1_path , config : { mimeType : "image/jpeg" }, }); // Prepare the second image as inline data const image2_path = "path/to/image2.png" ; const base64Image2File = fs . readFileSync ( image2_path , { encoding : "base64" , }); // Create the prompt with text and multiple images const response = await ai . models . generateContent ({ model : "gemini-2.5-flash" , contents : createUserContent ([ "What is different between these two images?" , createPartFromUri ( uploadedFile . uri , uploadedFile . mimeType ), { inlineData : { mimeType : "image/png" , data : base64Image2File , }, }, ]), }); console . log ( response . text ); } await main (); Go // Upload the first image image1Path := "path/to/image1.jpg" uploadedFile , _ := client . Files . UploadFromPath ( ctx , image1Path , nil ) // Prepare the second image as inline data image2Path := "path/to/image2.jpeg" imgBytes , _ := os . ReadFile ( image2Path ) parts := [] * genai . Part { genai . NewPartFromText ( "What is different between these two images?" ), genai . NewPartFromBytes ( imgBytes , "image/jpeg" ), genai . NewPartFromURI ( uploadedFile . URI , uploadedFile . MIMEType ), } contents := [] * genai . Content { genai . NewContentFromParts ( parts , genai . RoleUser ), } result , _ := client . Models . GenerateContent ( ctx , "gemini-2.5-flash" , contents , nil , ) fmt . Println ( result . Text ()) REST # Upload the first image IMAGE1_PATH = "path/to/image1.jpg" MIME1_TYPE = $( file -b --mime-type " ${ IMAGE1_PATH } " ) NUM1_BYTES = $( wc -c < " ${ IMAGE1_PATH } " ) DISPLAY_NAME1 = IMAGE1 tmp_header_file1 = upload-header1.tmp curl "https://generativelanguage.googleapis.com/upload/v1beta/files" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -D upload-header1.tmp \ -H "X-Goog-Upload-Protocol: resumable" \ -H "X-Goog-Upload-Command: start" \ -H "X-Goog-Upload-Header-Content-Length: ${ NUM1_BYTES } " \ -H "X-Goog-Upload-Header-Content-Type: ${ MIME1_TYPE } " \ -H "Content-Type: application/json" \ -d "{'file': {'display_name': ' ${ DISPLAY_NAME1 } '}}" 2 > /dev/null upload_url1 = $( grep -i "x-goog-upload-url: " " ${ tmp_header_file1 } " | cut -d " " -f2 | tr -d "\r" ) rm " ${ tmp_header_file1 } " curl " ${ upload_url1 } " \ -H "Content-Length: ${ NUM1_BYTES } " \ -H "X-Goog-Upload-Offset: 0" \ -H "X-Goog-Upload-Command: upload, finalize" \ --data-binary "@ ${ IMAGE1_PATH } " 2 > /dev/null > file_info1.json file1_uri = $( jq ".file.uri" file_info1.json ) echo file1_uri = $file1_uri # Prepare the second image (inline) IMAGE2_PATH = "path/to/image2.png" MIME2_TYPE = $( file -b --mime-type " ${ IMAGE2_PATH } " ) if [[ " $( base64 --version 2>&1 ) " = * "FreeBSD" * ]] ; then B64FLAGS = "--input" else B64FLAGS = "-w0" fi IMAGE2_BASE64 = $( base64 $B64FLAGS $IMAGE2_PATH ) # Now generate content using both images curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \ -H "x-goog-api-key: $GEMINI_API_KEY " \ -H 'Content-Type: application/json' \ -X POST \ -d '{ "contents": [{ "parts":[ {"text": "What is different between these two images?"}, {"file_data":{"mime_type": "' " ${ MIME1_TYPE } " '", "file_uri": ' $file1_uri '}}, { "inline_data": { "mime_type":"' " ${ MIME2_TYPE } " '", "data": "' " $IMAGE2_BASE64 " '" } } ] }] }' 2 > /dev/null > response.json cat response.json echo jq ".candidates[].content.parts[].text" response.json Object detection From Gemini 2.0 onwards, models are further trained to detect objects in an image and get their bounding box coordinates. The coordinates, relative to image dimensions, scale to [0, 1000]. You need to descale these coordinates based on your original image size. Python from google import genai from google.genai import types from PIL import Image import json client = genai . Client () prompt = "Detect the all of the prominent items in the image. The box_2d should be [ymin, xmin, ymax, xmax] normalized to 0-1000." image = Image . open ( "/path/to/image.png" ) config = types . GenerateContentConfig ( response_mime_type = "application/json" ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ image , prompt ], config = config ) width , height = image . size bounding_boxes = json . loads ( response . text ) converted_bounding_boxes = [] for bounding_box in bounding_boxes : abs_y1 = int ( bounding_box [ "box_2d" ][ 0 ] / 1000 * height ) abs_x1 = int ( bounding_box [ "box_2d" ][ 1 ] / 1000 * width ) abs_y2 = int ( bounding_box [ "box_2d" ][ 2 ] / 1000 * height ) abs_x2 = int ( bounding_box [ "box_2d" ][ 3 ] / 1000 * width ) converted_bounding_boxes . append ([ abs_x1 , abs_y1 , abs_x2 , abs_y2 ]) print ( "Image size: " , width , height ) print ( "Bounding boxes:" , converted_bounding_boxes ) Note: The model also supports generating bounding boxes based on custom instructions, such as: "Show bounding boxes of all green objects in this image". It also support custom labels like "label the items with the allergens they can contain". For more examples, check following notebooks in the Gemini Cookbook : 2D spatial understanding notebook Experimental 3D pointing notebook Segmentation Starting with Gemini 2.5, models not only detect items but also segment them and provide their contour masks. The model predicts a JSON list, where each item represents a segmentation mask. Each item has a bounding box (" box_2d ") in the format [y0, x0, y1, x1] with normalized coordinates between 0 and 1000, a label (" label ") that identifies the object, and finally the segmentation mask inside the bounding box, as base64 encoded png that is a probability map with values between 0 and 255. The mask needs to be resized to match the bounding box dimensions, then binarized at your confidence threshold (127 for the midpoint). Note: For better results, disable thinking by setting the thinking budget to 0. See code sample below for an example. Python from google import genai from google.genai import types from PIL import Image , ImageDraw import io import base64 import json import numpy as np import os client = genai . Client () def parse_json ( json_output : str ): # Parsing out the markdown fencing lines = json_output . splitlines () for i , line in enumerate ( lines ): if line == "```json" : json_output = " \n " . join ( lines [ i + 1 :]) # Remove everything before "```json" output = json_output . split ( "```" )[ 0 ] # Remove everything after the closing "```" break # Exit the loop once "```json" is found return json_output def extract_segmentation_masks ( image_path : str , output_dir : str = "segmentation_outputs" ): # Load and resize image im = Image . open ( image_path ) im . thumbnail ([ 1024 , 1024 ], Image . Resampling . LANCZOS ) prompt = """ Give the segmentation masks for the wooden and glass items. Output a JSON list of segmentation masks where each entry contains the 2D bounding box in the key "box_2d", the segmentation mask in key "mask", and the text label in the key "label". Use descriptive labels. """ config = types . GenerateContentConfig ( thinking_config = types . ThinkingConfig ( thinking_budget = 0 ) # set thinking_budget to 0 for better results in object detection ) response = client . models . generate_content ( model = "gemini-2.5-flash" , contents = [ prompt , im ], # Pillow images can be directly passed as inputs (which will be converted by the SDK) config = config ) # Parse JSON response items = json . loads ( parse_json ( response . text )) # Create output directory os . makedirs ( output_dir , exist_ok = True ) # Process each mask for i , item in enumerate ( items ): # Get bounding box coordinates box = item [ "box_2d" ] y0 = int ( box [ 0 ] / 1000 * im . size [ 1 ]) x0 = int ( box [ 1 ] / 1000 * im . size [ 0 ]) y1 = int ( box [ 2 ] / 1000 * im . size [ 1 ]) x1 = int ( box [ 3 ] / 1000 * im . size [ 0 ]) # Skip invalid boxes if y0 > = y1 or x0 > = x1 : continue # Process mask png_str = item [ "mask" ] if not png_str . startswith ( "data:image/png;base64," ): continue # Remove prefix png_str = png_str . removeprefix ( "data:image/png;base64," ) mask_data = base64 . b64decode ( png_str ) mask = Image . open ( io . BytesIO ( mask_data )) # Resize mask to match bounding box mask = mask . resize (( x1 - x0 , y1 - y0 ), Image . Resampling . BILINEAR ) # Convert mask to numpy array for processing mask_array = np . array ( mask ) # Create overlay for this mask overlay = Image . new ( 'RGBA' , im . size , ( 0 , 0 , 0 , 0 )) overlay_draw = ImageDraw . Draw ( overlay ) # Create overlay for the mask color = ( 255 , 255 , 255 , 200 ) for y in range ( y0 , y1 ): for x in range ( x0 , x1 ): if mask_array [ y - y0 , x - x0 ] > 128 : # Threshold for mask overlay_draw . point (( x , y ), fill = color ) # Save individual mask and its overlay mask_filename = f " { item [ 'label' ] } _ { i } _mask.png" overlay_filename = f " { item [ 'label' ] } _ { i } _overlay.png" mask . save ( os . path . join ( output_dir , mask_filename )) # Create and save overlay composite = Image . alpha_composite ( im . convert ( 'RGBA' ), overlay ) composite . save ( os . path . join ( output_dir , overlay_filename )) print ( f "Saved mask and overlay for { item [ 'label' ] } to { output_dir } " ) # Example usage if __name__ == "__main__" : extract_segmentation_masks ( "path/to/image.png" ) Check the segmentation example in the cookbook guide for a more detailed example. An example segmentation output with objects and segmentation masks Supported image formats Gemini supports the following image format MIME types: PNG - image/png JPEG - image/jpeg WEBP - image/webp HEIC - image/heic HEIF - image/heif Capabilities All Gemini model versions are multimodal and can be utilized in a wide range of image processing and computer vision tasks including but not limited to image captioning, visual question and answering, image classification, object detection and segmentation. Gemini can reduce the need to use specialized ML models depending on your quality and performance requirements. Some later model versions are specifically trained improve accuracy of specialized tasks in addition to generic capabilities: Gemini 2.0 models are further trained to support enhanced object detection . Gemini 2.5 models are further trained to support enhanced segmentation in addition to object detection . Limitations and key technical information File limit Gemini 2.5 Pro/Flash, 2.0 Flash, 1.5 Pro, and 1.5 Flash support a maximum of 3,600 image files per request. Token calculation Gemini 1.5 Flash and Gemini 1.5 Pro : 258 tokens if both dimensions <= 384 pixels. Larger images are tiled (min tile 256px, max 768px, resized to 768x768), with each tile costing 258 tokens. Gemini 2.0 Flash and Gemini 2.5 Flash/Pro : 258 tokens if both dimensions <= 384 pixels. Larger images are tiled into 768x768 pixel tiles, each costing 258 tokens. A rough formula for calculating the number of tiles is as follows: Calculate the crop unit size which is roughly: floor(min(width, height) / 1.5). Divide each dimension by the crop unit size and multiply together to get the number of tiles. For example, for an image of dimensions 960x540 would have a crop unit size of 360. Divide each dimension by 360 and the number of tile is 3 * 2 = 6. Tips and best practices Verify that images are correctly rotated. Use clear, non-blurry images. When using a single image with text, place the text prompt after the image part in the contents array. What's next This guide shows you how to upload image files and generate text outputs from image inputs. To learn more, see the following resources: Files API : Learn more about uploading and managing files for use with Gemini. System instructions : System instructions let you steer the behavior of the model based on your specific needs and use cases. File prompting strategies : The Gemini API supports prompting with text, image, audio, and video data, also known as multimodal prompting. Safety guidance : Sometimes generative AI models produce unexpected outputs, such as outputs that are inaccurate, biased, or offensive. Post-processing and human evaluation are essential to limit the risk of harm from such outputs. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-09-05 UTC.
text_content/docs_libraries_c54e7094.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/libraries#install
2
  Title: Gemini API libraries  |  Google AI for Developers
3
  ==================================================
4
 
5
- Gemini API libraries | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Gemini API libraries When building with the Gemini API, we recommend using our official collection of libraries across major languages: the Google GenAI SDK . They are production ready under General Availability . Our samples and documentation across this site are built using these libraries. Note: If you're using one of our legacy libraries, we strongly recommend you migrate to the Google GenAI SDK. Review the legacy libraries section for more information. If you're new to the Gemini API, follow our quickstart guide to get started. Language support and installation The Google GenAI SDK is available for the Python, JavaScript/TypeScript, Go and Java languages. You can install each language's library using package managers, or visit their GitHub repos for further engagement: Python Library: google-genai GitHub Repository: googleapis/python-genai Installation: pip install google-genai JavaScript Library: @google/genai GitHub Repository: googleapis/js-genai Installation: npm install @google/genai Go Library: google.golang.org/genai GitHub Repository: googleapis/go-genai Installation: go get google.golang.org/genai Java Library: google-genai GitHub Repository: googleapis/java-genai Installation: If you're using Maven, add the following to your dependencies: <dependencies> <dependency> <groupId>com.google.genai</groupId> <artifactId>google-genai</artifactId> <version>1.0.0</version> </dependency> </dependencies> General availability We started rolling out the Google GenAI SDK in late 2024. As of May 2025, it reached General Availability (GA) across all supported platforms. This means the libraries are stable and fully supported for production use. They are actively maintained, provide access to the latest features, and offer the best performance working with Gemini. If you're not using the Google GenAI SDK and using one of our legacy libraries, we strongly recommend you to migrate. Review the legacy libraries section for more information. Legacy libraries and migration If you are using one of our legacy libraries, we recommend that you migrate to the new libraries . The legacy libraries don't provide access to recent features (such as Live API and Veo ) and are on a deprecation path. They will stop receiving updates at the end of September 2025, the feature gaps will grow and potential bugs may no longer get fixed. Each legacy library's support status varies, detailed in the following table: Language Legacy library Support status Recommended library Python google-generativeai All support, including bug fixes, ends end of September 2025. google-genai JavaScript/TypeScript @google/generativeai All support, including bug fixes, ends end of September 2025. @google/genai Go google.golang.org/generative-ai All support, including bug fixes, ends end of September 2025. google.golang.org/genai Dart and Flutter google_generative_ai Not actively maintained Use trusted community or third party libraries, like firebase_ai , or access using REST API Swift generative-ai-swift Not actively maintained Use Gemini in Firebase Android generative-ai-android Not actively maintained Use Gemini in Firebase Note for Java developers: There was no legacy Google-provided Java SDK for the Gemini API, so no migration from a previous Google library is required. You can start directly with the new library in the Language support and installation section. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-09 UTC.
 
2
  Title: Gemini API libraries  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Gemini API libraries | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Gemini API libraries When building with the Gemini API, we recommend using the Google GenAI SDK . These are the official, production-ready libraries that we develop and maintain for the most popular languages. They are in General Availability and used in all our official documentation and examples. Note: If you're using one of our legacy libraries, we strongly recommend you migrate to the Google GenAI SDK. Review the legacy libraries section for more information. If you're new to the Gemini API, follow our quickstart guide to get started. Language support and installation The Google GenAI SDK is available for the Python, JavaScript/TypeScript, Go and Java languages. You can install each language's library using package managers, or visit their GitHub repos for further engagement: Python Library: google-genai GitHub Repository: googleapis/python-genai Installation: pip install google-genai JavaScript Library: @google/genai GitHub Repository: googleapis/js-genai Installation: npm install @google/genai Go Library: google.golang.org/genai GitHub Repository: googleapis/go-genai Installation: go get google.golang.org/genai Java Library: google-genai GitHub Repository: googleapis/java-genai Installation: If you're using Maven, add the following to your dependencies: <dependencies> <dependency> <groupId>com.google.genai</groupId> <artifactId>google-genai</artifactId> <version>1.0.0</version> </dependency> </dependencies> General availability We started rolling out Google GenAI SDK, a new set of libraries to access Gemini API, in late 2024 when we launched Gemini 2.0. As of May 2025, they reached General Availability (GA) across all supported platforms and are the recommended libraries to access the Gemini API. They are stable, fully supported for production use, and are actively maintained. They provide access to the latest features, and offer the best performance working with Gemini. If you're using one of our legacy libraries, we strongly recommend you migrate so that you can access the latest features and get the best performance working with Gemini. Review the legacy libraries section for more information. Legacy libraries and migration If you are using one of our legacy libraries, we recommend that you migrate to the new libraries . The legacy libraries don't provide access to recent features (such as Live API and Veo ) and are on a deprecation path. They will stop receiving updates on November 30th, 2025, the feature gaps will grow and potential bugs may no longer get fixed. Each legacy library's support status varies, detailed in the following table: Language Legacy library Support status Recommended library Python google-generativeai All support, including bug fixes, ends on November 30th, 2025. google-genai JavaScript/TypeScript @google/generativeai All support, including bug fixes, ends on November 30th, 2025. @google/genai Go google.golang.org/generative-ai All support, including bug fixes, ends on November 30th, 2025. google.golang.org/genai Dart and Flutter google_generative_ai Not actively maintained Use trusted community or third party libraries, like firebase_ai , or access using REST API Swift generative-ai-swift Not actively maintained Use Firebase AI Logic Android generative-ai-android Not actively maintained Use Firebase AI Logic Note for Java developers: There was no legacy Google-provided Java SDK for the Gemini API, so no migration from a previous Google library is required. You can start directly with the new library in the Language support and installation section. Prompt templates for code generation Generative models (e.g., Gemini, Claude) and AI-powered IDEs (e.g., Cursor) may produce code for the Gemini API using outdated or deprecated libraries due to their training data cutoff. For the generated code to use the latest, recommended libraries, provide version and usage guidance directly in your prompts. You can use the templates below to provide the necessary context: Python JavaScript/TypeScript Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_live-guide_51ee6d26.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/live-guide#native-audio-output-thinki
2
  Title: Live API capabilities guide  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Live API capabilities guide | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Live API capabilities guide Preview: The Live API is in preview. This is a comprehensive guide that covers capabilities and configurations available with the Live API. See Get started with Live API page for a overview and sample code for common use cases. Before you begin Familiarize yourself with core concepts: If you haven't already done so, read the Get started with Live API page first. This will introduce you to the fundamental principles of the Live API, how it works, and the distinction between the different models and their corresponding audio generation methods ( native audio or half-cascade). Try the Live API in AI Studio: You may find it useful to try the Live API in Google AI Studio before you start building. To use the Live API in Google AI Studio, select Stream . Establishing a connection The following example shows how to create a connection with an API key: Python import asyncio from google import genai client = genai . Client () model = "gemini-live-2.5-flash-preview" config = { "response_modalities" : [ "TEXT" ]} async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : print ( "Session started" ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const config = { responseModalities : [ Modality . TEXT ] }; async function main () { const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { console . debug ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); // Send content... session . close (); } main (); Note: You can only set one modality in the response_modalities field. This means that you can configure the model to respond with either text or audio, but not both in the same session. Interaction modalities The following sections provide examples and supporting context for the different input and output modalities available in Live API. Sending and receiving text Here's how you can send and receive text: Python import asyncio from google import genai client = genai . Client () model = "gemini-live-2.5-flash-preview" config = { "response_modalities" : [ "TEXT" ]} async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : message = "Hello, how are you?" await session . send_client_content ( turns = { "role" : "user" , "parts" : [{ "text" : message }]}, turn_complete = True ) async for response in session . receive (): if response . text is not None : print ( response . text , end = "" ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const config = { responseModalities : [ Modality . TEXT ] }; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'Hello how are you?' ; session . sendClientContent ({ turns : inputTurns }); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . text ) { console . debug ( 'Received text: %s\n' , turn . text ); } else if ( turn . data ) { console . debug ( 'Received inline data: %s\n' , turn . data ); } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); Incremental content updates Use incremental updates to send text input, establish session context, or restore session context. For short contexts you can send turn-by-turn interactions to represent the exact sequence of events: Python turns = [ { "role" : "user" , "parts" : [{ "text" : "What is the capital of France?" }]}, { "role" : "model" , "parts" : [{ "text" : "Paris" }]}, ] await session . send_client_content ( turns = turns , turn_complete = False ) turns = [{ "role" : "user" , "parts" : [{ "text" : "What is the capital of Germany?" }]}] await session . send_client_content ( turns = turns , turn_complete = True ) JavaScript let inputTurns = [ { "role" : "user" , "parts" : [{ "text" : "What is the capital of France?" }] }, { "role" : "model" , "parts" : [{ "text" : "Paris" }] }, ] session . sendClientContent ({ turns : inputTurns , turnComplete : false }) inputTurns = [{ "role" : "user" , "parts" : [{ "text" : "What is the capital of Germany?" }] }] session . sendClientContent ({ turns : inputTurns , turnComplete : true }) For longer contexts it's recommended to provide a single message summary to free up the context window for subsequent interactions. See Session Resumption for another method for loading session context. Sending and receiving audio The most common audio example, audio-to-audio , is covered in the Getting started guide. Here's an audio-to-text example that reads a WAV file, sends it in the correct format and receives text output: Python # Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav # Install helpers for converting files: pip install librosa soundfile import asyncio import io from pathlib import Path from google import genai from google.genai import types import soundfile as sf import librosa client = genai . Client () model = "gemini-live-2.5-flash-preview" config = { "response_modalities" : [ "TEXT" ]} async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : buffer = io . BytesIO () y , sr = librosa . load ( "sample.wav" , sr = 16000 ) sf . write ( buffer , y , sr , format = 'RAW' , subtype = 'PCM_16' ) buffer . seek ( 0 ) audio_bytes = buffer . read () # If already in correct format, you can use this: # audio_bytes = Path("sample.pcm").read_bytes() await session . send_realtime_input ( audio = types . Blob ( data = audio_bytes , mime_type = "audio/pcm;rate=16000" ) ) async for response in session . receive (): if response . text is not None : print ( response . text ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript // Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav // Install helpers for converting files: npm install wavefile import { GoogleGenAI , Modality } from '@google/genai' ; import * as fs from "node:fs" ; import pkg from 'wavefile' ; const { WaveFile } = pkg ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const config = { responseModalities : [ Modality . TEXT ] }; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); // Send Audio Chunk const fileBuffer = fs . readFileSync ( "sample.wav" ); // Ensure audio conforms to API requirements (16-bit PCM, 16kHz, mono) const wav = new WaveFile (); wav . fromBuffer ( fileBuffer ); wav . toSampleRate ( 16000 ); wav . toBitDepth ( "16" ); const base64Audio = wav . toBase64 (); // If already in correct format, you can use this: // const fileBuffer = fs.readFileSync("sample.pcm"); // const base64Audio = Buffer.from(fileBuffer).toString('base64'); session . sendRealtimeInput ( { audio : { data : base64Audio , mimeType : "audio/pcm;rate=16000" } } ); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . text ) { console . debug ( 'Received text: %s\n' , turn . text ); } else if ( turn . data ) { console . debug ( 'Received inline data: %s\n' , turn . data ); } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); And here is a text-to-audio example. You can receive audio by setting AUDIO as response modality. This example saves the received data as WAV file: Python import asyncio import wave from google import genai client = genai . Client () model = "gemini-live-2.5-flash-preview" config = { "response_modalities" : [ "AUDIO" ]} async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : wf = wave . open ( "audio.wav" , "wb" ) wf . setnchannels ( 1 ) wf . setsampwidth ( 2 ) wf . setframerate ( 24000 ) message = "Hello how are you?" await session . send_client_content ( turns = { "role" : "user" , "parts" : [{ "text" : message }]}, turn_complete = True ) async for response in session . receive (): if response . data is not None : wf . writeframes ( response . data ) # Un-comment this code to print audio data info # if response.server_content.model_turn is not None: # print(response.server_content.model_turn.parts[0].inline_data.mime_type) wf . close () if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; import * as fs from "node:fs" ; import pkg from 'wavefile' ; const { WaveFile } = pkg ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const config = { responseModalities : [ Modality . AUDIO ] }; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'Hello how are you?' ; session . sendClientContent ({ turns : inputTurns }); const turns = await handleTurn (); // Combine audio data strings and save as wave file const combinedAudio = turns . reduce (( acc , turn ) = > { if ( turn . data ) { const buffer = Buffer . from ( turn . data , 'base64' ); const intArray = new Int16Array ( buffer . buffer , buffer . byteOffset , buffer . byteLength / Int16Array . BYTES_PER_ELEMENT ); return acc . concat ( Array . from ( intArray )); } return acc ; }, []); const audioBuffer = new Int16Array ( combinedAudio ); const wf = new WaveFile (); wf . fromScratch ( 1 , 24000 , '16' , audioBuffer ); fs . writeFileSync ( 'output.wav' , wf . toBuffer ()); session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); Audio formats Audio data in the Live API is always raw, little-endian, 16-bit PCM. Audio output always uses a sample rate of 24kHz. Input audio is natively 16kHz, but the Live API will resample if needed so any sample rate can be sent. To convey the sample rate of input audio, set the MIME type of each audio-containing Blob to a value like audio/pcm;rate=16000 . Audio transcriptions You can enable transcription of the model's audio output by sending output_audio_transcription in the setup config. The transcription language is inferred from the model's response. Python import asyncio from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" config = { "response_modalities" : [ "AUDIO" ], "output_audio_transcription" : {} } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : message = "Hello? Gemini are you there?" await session . send_client_content ( turns = { "role" : "user" , "parts" : [{ "text" : message }]}, turn_complete = True ) async for response in session . receive (): if response . server_content . model_turn : print ( "Model turn:" , response . server_content . model_turn ) if response . server_content . output_transcription : print ( "Transcript:" , response . server_content . output_transcription . text ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const config = { responseModalities : [ Modality . AUDIO ], outputAudioTranscription : {} }; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'Hello how are you?' ; session . sendClientContent ({ turns : inputTurns }); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . outputTranscription ) { console . debug ( 'Received output transcription: %s\n' , turn . serverContent . outputTranscription . text ); } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); You can enable transcription of the audio input by sending input_audio_transcription in setup config. Python import asyncio from pathlib import Path from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" config = { "response_modalities" : [ "TEXT" ], "input_audio_transcription" : {}, } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : audio_data = Path ( "16000.pcm" ) . read_bytes () await session . send_realtime_input ( audio = types . Blob ( data = audio_data , mime_type = 'audio/pcm;rate=16000' ) ) async for msg in session . receive (): if msg . server_content . input_transcription : print ( 'Transcript:' , msg . server_content . input_transcription . text ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; import * as fs from "node:fs" ; import pkg from 'wavefile' ; const { WaveFile } = pkg ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const config = { responseModalities : [ Modality . TEXT ], inputAudioTranscription : {} }; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); // Send Audio Chunk const fileBuffer = fs . readFileSync ( "16000.wav" ); // Ensure audio conforms to API requirements (16-bit PCM, 16kHz, mono) const wav = new WaveFile (); wav . fromBuffer ( fileBuffer ); wav . toSampleRate ( 16000 ); wav . toBitDepth ( "16" ); const base64Audio = wav . toBase64 (); // If already in correct format, you can use this: // const fileBuffer = fs.readFileSync("sample.pcm"); // const base64Audio = Buffer.from(fileBuffer).toString('base64'); session . sendRealtimeInput ( { audio : { data : base64Audio , mimeType : "audio/pcm;rate=16000" } } ); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . outputTranscription ) { console . log ( "Transcription" ) console . log ( turn . serverContent . outputTranscription . text ); } } for ( const turn of turns ) { if ( turn . text ) { console . debug ( 'Received text: %s\n' , turn . text ); } else if ( turn . data ) { console . debug ( 'Received inline data: %s\n' , turn . data ); } else if ( turn . serverContent && turn . serverContent . inputTranscription ) { console . debug ( 'Received input transcription: %s\n' , turn . serverContent . inputTranscription . text ); } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); Stream audio and video To see an example of how to use the Live API in a streaming audio and video format, run the "Live API - Get Started" file in the cookbooks repository: View on Colab Change voice and language The Live API models each support a different set of voices. Half-cascade supports Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr. Native audio supports a much longer list (identical to the TTS model list ). You can listen to all the voices in AI Studio . To specify a voice, set the voice name within the speechConfig object as part of the session configuration: Python config = { "response_modalities" : [ "AUDIO" ], "speech_config" : { "voice_config" : { "prebuilt_voice_config" : { "voice_name" : "Kore" }} }, } JavaScript const config = { responseModalities : [ Modality . AUDIO ], speechConfig : { voiceConfig : { prebuiltVoiceConfig : { voiceName : "Kore" } } } }; Note: If you're using the generateContent API, the set of available voices is slightly different. See the audio generation guide for generateContent audio generation voices. The Live API supports multiple languages . To change the language, set the language code within the speechConfig object as part of the session configuration: Python config = { "response_modalities" : [ "AUDIO" ], "speech_config" : { "language_code" : "de-DE" } } JavaScript const config = { responseModalities : [ Modality . AUDIO ], speechConfig : { languageCode : "de-DE" } }; Note: Native audio output models automatically choose the appropriate language and don't support explicitly setting the language code. Native audio capabilities The following capabilities are only available with native audio. You can learn more about native audio in Choose a model and audio generation . Note: Native audio models currently have limited tool use support. See Overview of supported tools for details. How to use native audio output To use native audio output, configure one of the native audio models and set response_modalities to AUDIO . See Send and receive audio for a full example. Python model = "gemini-2.5-flash-preview-native-audio-dialog" config = types . LiveConnectConfig ( response_modalities = [ "AUDIO" ]) async with client . aio . live . connect ( model = model , config = config ) as session : # Send audio input and receive audio JavaScript const model = 'gemini-2.5-flash-preview-native-audio-dialog' ; const config = { responseModalities : [ Modality . AUDIO ] }; async function main () { const session = await ai . live . connect ({ model : model , config : config , callbacks : ..., }); // Send audio input and receive audio session . close (); } main (); Affective dialog This feature lets Gemini adapt its response style to the input expression and tone. To use affective dialog, set the api version to v1alpha and set enable_affective_dialog to true in the setup message: Python client = genai . Client ( http_options = { "api_version" : "v1alpha" }) config = types . LiveConnectConfig ( response_modalities = [ "AUDIO" ], enable_affective_dialog = True ) JavaScript const ai = new GoogleGenAI ({ httpOptions : { "apiVersion" : "v1alpha" } }); const config = { responseModalities : [ Modality . AUDIO ], enableAffectiveDialog : true }; Note that affective dialog is currently only supported by the native audio output models. Proactive audio When this feature is enabled, Gemini can proactively decide not to respond if the content is not relevant. To use it, set the api version to v1alpha and configure the proactivity field in the setup message and set proactive_audio to true : Python client = genai . Client ( http_options = { "api_version" : "v1alpha" }) config = types . LiveConnectConfig ( response_modalities = [ "AUDIO" ], proactivity = { 'proactive_audio' : True } ) JavaScript const ai = new GoogleGenAI ({ httpOptions : { "apiVersion" : "v1alpha" } }); const config = { responseModalities : [ Modality . AUDIO ], proactivity : { proactiveAudio : true } } Note that proactive audio is currently only supported by the native audio output models. Native audio output with thinking Native audio output supports thinking capabilities , available via a separate model gemini-2.5-flash-exp-native-audio-thinking-dialog . See Send and receive audio for a full example. Python model = "gemini-2.5-flash-exp-native-audio-thinking-dialog" config = types . LiveConnectConfig ( response_modalities = [ "AUDIO" ]) async with client . aio . live . connect ( model = model , config = config ) as session : # Send audio input and receive audio JavaScript const model = 'gemini-2.5-flash-exp-native-audio-thinking-dialog' ; const config = { responseModalities : [ Modality . AUDIO ] }; async function main () { const session = await ai . live . connect ({ model : model , config : config , callbacks : ..., }); // Send audio input and receive audio session . close (); } main (); Voice Activity Detection (VAD) Voice Activity Detection (VAD) allows the model to recognize when a person is speaking. This is essential for creating natural conversations, as it allows a user to interrupt the model at any time. When VAD detects an interruption, the ongoing generation is canceled and discarded. Only the information already sent to the client is retained in the session history. The server then sends a BidiGenerateContentServerContent message to report the interruption. The Gemini server then discards any pending function calls and sends a BidiGenerateContentServerContent message with the IDs of the canceled calls. Python async for response in session . receive (): if response . server_content . interrupted is True : # The generation was interrupted # If realtime playback is implemented in your application, # you should stop playing audio and clear queued playback here. JavaScript const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . interrupted ) { // The generation was interrupted // If realtime playback is implemented in your application, // you should stop playing audio and clear queued playback here. } } Automatic VAD By default, the model automatically performs VAD on a continuous audio input stream. VAD can be configured with the realtimeInputConfig.automaticActivityDetection field of the setup configuration . When the audio stream is paused for more than a second (for example, because the user switched off the microphone), an audioStreamEnd event should be sent to flush any cached audio. The client can resume sending audio data at any time. Python # example audio file to try: # URL = "https://storage.googleapis.com/generativeai-downloads/data/hello_are_you_there.pcm" # !wget -q $URL -O sample.pcm import asyncio from pathlib import Path from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" config = { "response_modalities" : [ "TEXT" ]} async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : audio_bytes = Path ( "sample.pcm" ) . read_bytes () await session . send_realtime_input ( audio = types . Blob ( data = audio_bytes , mime_type = "audio/pcm;rate=16000" ) ) # if stream gets paused, send: # await session.send_realtime_input(audio_stream_end=True) async for response in session . receive (): if response . text is not None : print ( response . text ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript // example audio file to try: // URL = "https://storage.googleapis.com/generativeai-downloads/data/hello_are_you_there.pcm" // !wget -q $URL -O sample.pcm import { GoogleGenAI , Modality } from '@google/genai' ; import * as fs from "node:fs" ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const config = { responseModalities : [ Modality . TEXT ] }; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); // Send Audio Chunk const fileBuffer = fs . readFileSync ( "sample.pcm" ); const base64Audio = Buffer . from ( fileBuffer ). toString ( 'base64' ); session . sendRealtimeInput ( { audio : { data : base64Audio , mimeType : "audio/pcm;rate=16000" } } ); // if stream gets paused, send: // session.sendRealtimeInput({ audioStreamEnd: true }) const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . text ) { console . debug ( 'Received text: %s\n' , turn . text ); } else if ( turn . data ) { console . debug ( 'Received inline data: %s\n' , turn . data ); } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); With send_realtime_input , the API will respond to audio automatically based on VAD. While send_client_content adds messages to the model context in order, send_realtime_input is optimized for responsiveness at the expense of deterministic ordering. Automatic VAD configuration For more control over the VAD activity, you can configure the following parameters. See API reference for more info. Python from google.genai import types config = { "response_modalities" : [ "TEXT" ], "realtime_input_config" : { "automatic_activity_detection" : { "disabled" : False , # default "start_of_speech_sensitivity" : types . StartSensitivity . START_SENSITIVITY_LOW , "end_of_speech_sensitivity" : types . EndSensitivity . END_SENSITIVITY_LOW , "prefix_padding_ms" : 20 , "silence_duration_ms" : 100 , } } } JavaScript import { GoogleGenAI , Modality , StartSensitivity , EndSensitivity } from '@google/genai' ; const config = { responseModalities : [ Modality . TEXT ], realtimeInputConfig : { automaticActivityDetection : { disabled : false , // default startOfSpeechSensitivity : StartSensitivity . START_SENSITIVITY_LOW , endOfSpeechSensitivity : EndSensitivity . END_SENSITIVITY_LOW , prefixPaddingMs : 20 , silenceDurationMs : 100 , } } }; Disable automatic VAD Alternatively, the automatic VAD can be disabled by setting realtimeInputConfig.automaticActivityDetection.disabled to true in the setup message. In this configuration the client is responsible for detecting user speech and sending activityStart and activityEnd messages at the appropriate times. An audioStreamEnd isn't sent in this configuration. Instead, any interruption of the stream is marked by an activityEnd message. Python config = { "response_modalities" : [ "TEXT" ], "realtime_input_config" : { "automatic_activity_detection" : { "disabled" : True }}, } async with client . aio . live . connect ( model = model , config = config ) as session : # ... await session . send_realtime_input ( activity_start = types . ActivityStart ()) await session . send_realtime_input ( audio = types . Blob ( data = audio_bytes , mime_type = "audio/pcm;rate=16000" ) ) await session . send_realtime_input ( activity_end = types . ActivityEnd ()) # ... JavaScript const config = { responseModalities : [ Modality . TEXT ], realtimeInputConfig : { automaticActivityDetection : { disabled : true , } } }; session . sendRealtimeInput ({ activityStart : {} }) session . sendRealtimeInput ( { audio : { data : base64Audio , mimeType : "audio/pcm;rate=16000" } } ); session . sendRealtimeInput ({ activityEnd : {} }) Token count You can find the total number of consumed tokens in the usageMetadata field of the returned server message. Python async for message in session . receive (): # The server will periodically send messages that include UsageMetadata. if message . usage_metadata : usage = message . usage_metadata print ( f "Used { usage . total_token_count } tokens in total. Response token breakdown:" ) for detail in usage . response_tokens_details : match detail : case types . ModalityTokenCount ( modality = modality , token_count = count ): print ( f " { modality } : { count } " ) JavaScript const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . usageMetadata ) { console . debug ( 'Used %s tokens in total. Response token breakdown:\n' , turn . usageMetadata . totalTokenCount ); for ( const detail of turn . usageMetadata . responseTokensDetails ) { console . debug ( '%s\n' , detail ); } } } Media resolution You can specify the media resolution for the input media by setting the mediaResolution field as part of the session configuration: Python from google.genai import types config = { "response_modalities" : [ "AUDIO" ], "media_resolution" : types . MediaResolution . MEDIA_RESOLUTION_LOW , } JavaScript import { GoogleGenAI , Modality , MediaResolution } from '@google/genai' ; const config = { responseModalities : [ Modality . TEXT ], mediaResolution : MediaResolution . MEDIA_RESOLUTION_LOW , }; Limitations Consider the following limitations of the Live API when you plan your project. Response modalities You can only set one response modality ( TEXT or AUDIO ) per session in the session configuration. Setting both results in a config error message. This means that you can configure the model to respond with either text or audio, but not both in the same session. Client authentication The Live API only provides server-to-server authentication by default. If you're implementing your Live API application using a client-to-server approach , you need to use ephemeral tokens to mitigate security risks. Session duration Audio-only sessions are limited to 15 minutes, and audio plus video sessions are limited to 2 minutes. However, you can configure different session management techniques for unlimited extensions on session duration. Context window A session has a context window limit of: 128k tokens for native audio output models 32k tokens for other Live API models Supported languages Live API supports the following languages. Note: Native audio output models automatically choose the appropriate language and don't support explicitly setting the language code. Language BCP-47 Code Language BCP-47 Code German (Germany) de-DE English (Australia)* en-AU English (UK)* en-GB English (India) en-IN English (US) en-US Spanish (US) es-US French (France) fr-FR Hindi (India) hi-IN Portuguese (Brazil) pt-BR Arabic (Generic) ar-XA Spanish (Spain)* es-ES French (Canada)* fr-CA Indonesian (Indonesia) id-ID Italian (Italy) it-IT Japanese (Japan) ja-JP Turkish (Turkey) tr-TR Vietnamese (Vietnam) vi-VN Bengali (India) bn-IN Gujarati (India)* gu-IN Kannada (India)* kn-IN Marathi (India) mr-IN Malayalam (India)* ml-IN Tamil (India) ta-IN Telugu (India) te-IN Dutch (Netherlands) nl-NL Korean (South Korea) ko-KR Mandarin Chinese (China)* cmn-CN Polish (Poland) pl-PL Russian (Russia) ru-RU Thai (Thailand) th-TH Languages marked with an asterisk (*) are not available for Native audio . What's next Read the Tool Use and Session Management guides for essential information on using the Live API effectively. Try the Live API in Google AI Studio . For more info about the Live API models, see Gemini 2.0 Flash Live and Gemini 2.5 Flash Native Audio on the Models page. Try more examples in the Live API cookbook , the Live API Tools cookbook , and the Live API Get Started script . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-08 UTC.
 
2
  Title: Live API capabilities guide  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Live API capabilities guide | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Live API capabilities guide Preview: The Live API is in preview. This is a comprehensive guide that covers capabilities and configurations available with the Live API. See Get started with Live API page for a overview and sample code for common use cases. Before you begin Familiarize yourself with core concepts: If you haven't already done so, read the Get started with Live API page first. This will introduce you to the fundamental principles of the Live API, how it works, and the distinction between the different models and their corresponding audio generation methods ( native audio or half-cascade). Try the Live API in AI Studio: You may find it useful to try the Live API in Google AI Studio before you start building. To use the Live API in Google AI Studio, select Stream . Establishing a connection The following example shows how to create a connection with an API key: Python import asyncio from google import genai client = genai . Client () model = "gemini-live-2.5-flash-preview" config = { "response_modalities" : [ "TEXT" ]} async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : print ( "Session started" ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const config = { responseModalities : [ Modality . TEXT ] }; async function main () { const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { console . debug ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); // Send content... session . close (); } main (); Note: You can only set one modality in the response_modalities field. This means that you can configure the model to respond with either text or audio, but not both in the same session. Interaction modalities The following sections provide examples and supporting context for the different input and output modalities available in Live API. Sending and receiving text Here's how you can send and receive text: Python import asyncio from google import genai client = genai . Client () model = "gemini-live-2.5-flash-preview" config = { "response_modalities" : [ "TEXT" ]} async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : message = "Hello, how are you?" await session . send_client_content ( turns = { "role" : "user" , "parts" : [{ "text" : message }]}, turn_complete = True ) async for response in session . receive (): if response . text is not None : print ( response . text , end = "" ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const config = { responseModalities : [ Modality . TEXT ] }; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'Hello how are you?' ; session . sendClientContent ({ turns : inputTurns }); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . text ) { console . debug ( 'Received text: %s\n' , turn . text ); } else if ( turn . data ) { console . debug ( 'Received inline data: %s\n' , turn . data ); } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); Incremental content updates Use incremental updates to send text input, establish session context, or restore session context. For short contexts you can send turn-by-turn interactions to represent the exact sequence of events: Python turns = [ { "role" : "user" , "parts" : [{ "text" : "What is the capital of France?" }]}, { "role" : "model" , "parts" : [{ "text" : "Paris" }]}, ] await session . send_client_content ( turns = turns , turn_complete = False ) turns = [{ "role" : "user" , "parts" : [{ "text" : "What is the capital of Germany?" }]}] await session . send_client_content ( turns = turns , turn_complete = True ) JavaScript let inputTurns = [ { "role" : "user" , "parts" : [{ "text" : "What is the capital of France?" }] }, { "role" : "model" , "parts" : [{ "text" : "Paris" }] }, ] session . sendClientContent ({ turns : inputTurns , turnComplete : false }) inputTurns = [{ "role" : "user" , "parts" : [{ "text" : "What is the capital of Germany?" }] }] session . sendClientContent ({ turns : inputTurns , turnComplete : true }) For longer contexts it's recommended to provide a single message summary to free up the context window for subsequent interactions. See Session Resumption for another method for loading session context. Sending and receiving audio The most common audio example, audio-to-audio , is covered in the Getting started guide. Here's an audio-to-text example that reads a WAV file, sends it in the correct format and receives text output: Python # Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav # Install helpers for converting files: pip install librosa soundfile import asyncio import io from pathlib import Path from google import genai from google.genai import types import soundfile as sf import librosa client = genai . Client () model = "gemini-live-2.5-flash-preview" config = { "response_modalities" : [ "TEXT" ]} async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : buffer = io . BytesIO () y , sr = librosa . load ( "sample.wav" , sr = 16000 ) sf . write ( buffer , y , sr , format = 'RAW' , subtype = 'PCM_16' ) buffer . seek ( 0 ) audio_bytes = buffer . read () # If already in correct format, you can use this: # audio_bytes = Path("sample.pcm").read_bytes() await session . send_realtime_input ( audio = types . Blob ( data = audio_bytes , mime_type = "audio/pcm;rate=16000" ) ) async for response in session . receive (): if response . text is not None : print ( response . text ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript // Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav // Install helpers for converting files: npm install wavefile import { GoogleGenAI , Modality } from '@google/genai' ; import * as fs from "node:fs" ; import pkg from 'wavefile' ; const { WaveFile } = pkg ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const config = { responseModalities : [ Modality . TEXT ] }; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); // Send Audio Chunk const fileBuffer = fs . readFileSync ( "sample.wav" ); // Ensure audio conforms to API requirements (16-bit PCM, 16kHz, mono) const wav = new WaveFile (); wav . fromBuffer ( fileBuffer ); wav . toSampleRate ( 16000 ); wav . toBitDepth ( "16" ); const base64Audio = wav . toBase64 (); // If already in correct format, you can use this: // const fileBuffer = fs.readFileSync("sample.pcm"); // const base64Audio = Buffer.from(fileBuffer).toString('base64'); session . sendRealtimeInput ( { audio : { data : base64Audio , mimeType : "audio/pcm;rate=16000" } } ); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . text ) { console . debug ( 'Received text: %s\n' , turn . text ); } else if ( turn . data ) { console . debug ( 'Received inline data: %s\n' , turn . data ); } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); And here is a text-to-audio example. You can receive audio by setting AUDIO as response modality. This example saves the received data as WAV file: Python import asyncio import wave from google import genai client = genai . Client () model = "gemini-live-2.5-flash-preview" config = { "response_modalities" : [ "AUDIO" ]} async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : wf = wave . open ( "audio.wav" , "wb" ) wf . setnchannels ( 1 ) wf . setsampwidth ( 2 ) wf . setframerate ( 24000 ) message = "Hello how are you?" await session . send_client_content ( turns = { "role" : "user" , "parts" : [{ "text" : message }]}, turn_complete = True ) async for response in session . receive (): if response . data is not None : wf . writeframes ( response . data ) # Un-comment this code to print audio data info # if response.server_content.model_turn is not None: # print(response.server_content.model_turn.parts[0].inline_data.mime_type) wf . close () if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; import * as fs from "node:fs" ; import pkg from 'wavefile' ; const { WaveFile } = pkg ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const config = { responseModalities : [ Modality . AUDIO ] }; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'Hello how are you?' ; session . sendClientContent ({ turns : inputTurns }); const turns = await handleTurn (); // Combine audio data strings and save as wave file const combinedAudio = turns . reduce (( acc , turn ) = > { if ( turn . data ) { const buffer = Buffer . from ( turn . data , 'base64' ); const intArray = new Int16Array ( buffer . buffer , buffer . byteOffset , buffer . byteLength / Int16Array . BYTES_PER_ELEMENT ); return acc . concat ( Array . from ( intArray )); } return acc ; }, []); const audioBuffer = new Int16Array ( combinedAudio ); const wf = new WaveFile (); wf . fromScratch ( 1 , 24000 , '16' , audioBuffer ); fs . writeFileSync ( 'output.wav' , wf . toBuffer ()); session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); Audio formats Audio data in the Live API is always raw, little-endian, 16-bit PCM. Audio output always uses a sample rate of 24kHz. Input audio is natively 16kHz, but the Live API will resample if needed so any sample rate can be sent. To convey the sample rate of input audio, set the MIME type of each audio-containing Blob to a value like audio/pcm;rate=16000 . Audio transcriptions You can enable transcription of the model's audio output by sending output_audio_transcription in the setup config. The transcription language is inferred from the model's response. Python import asyncio from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" config = { "response_modalities" : [ "AUDIO" ], "output_audio_transcription" : {} } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : message = "Hello? Gemini are you there?" await session . send_client_content ( turns = { "role" : "user" , "parts" : [{ "text" : message }]}, turn_complete = True ) async for response in session . receive (): if response . server_content . model_turn : print ( "Model turn:" , response . server_content . model_turn ) if response . server_content . output_transcription : print ( "Transcript:" , response . server_content . output_transcription . text ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const config = { responseModalities : [ Modality . AUDIO ], outputAudioTranscription : {} }; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'Hello how are you?' ; session . sendClientContent ({ turns : inputTurns }); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . outputTranscription ) { console . debug ( 'Received output transcription: %s\n' , turn . serverContent . outputTranscription . text ); } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); You can enable transcription of the audio input by sending input_audio_transcription in setup config. Python import asyncio from pathlib import Path from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" config = { "response_modalities" : [ "TEXT" ], "input_audio_transcription" : {}, } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : audio_data = Path ( "16000.pcm" ) . read_bytes () await session . send_realtime_input ( audio = types . Blob ( data = audio_data , mime_type = 'audio/pcm;rate=16000' ) ) async for msg in session . receive (): if msg . server_content . input_transcription : print ( 'Transcript:' , msg . server_content . input_transcription . text ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; import * as fs from "node:fs" ; import pkg from 'wavefile' ; const { WaveFile } = pkg ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const config = { responseModalities : [ Modality . TEXT ], inputAudioTranscription : {} }; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); // Send Audio Chunk const fileBuffer = fs . readFileSync ( "16000.wav" ); // Ensure audio conforms to API requirements (16-bit PCM, 16kHz, mono) const wav = new WaveFile (); wav . fromBuffer ( fileBuffer ); wav . toSampleRate ( 16000 ); wav . toBitDepth ( "16" ); const base64Audio = wav . toBase64 (); // If already in correct format, you can use this: // const fileBuffer = fs.readFileSync("sample.pcm"); // const base64Audio = Buffer.from(fileBuffer).toString('base64'); session . sendRealtimeInput ( { audio : { data : base64Audio , mimeType : "audio/pcm;rate=16000" } } ); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . outputTranscription ) { console . log ( "Transcription" ) console . log ( turn . serverContent . outputTranscription . text ); } } for ( const turn of turns ) { if ( turn . text ) { console . debug ( 'Received text: %s\n' , turn . text ); } else if ( turn . data ) { console . debug ( 'Received inline data: %s\n' , turn . data ); } else if ( turn . serverContent && turn . serverContent . inputTranscription ) { console . debug ( 'Received input transcription: %s\n' , turn . serverContent . inputTranscription . text ); } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); Stream audio and video To see an example of how to use the Live API in a streaming audio and video format, run the "Live API - Get Started" file in the cookbooks repository: View on Colab Change voice and language The Live API models each support a different set of voices. Half-cascade supports Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr. Native audio supports a much longer list (identical to the TTS model list ). You can listen to all the voices in AI Studio . To specify a voice, set the voice name within the speechConfig object as part of the session configuration: Python config = { "response_modalities" : [ "AUDIO" ], "speech_config" : { "voice_config" : { "prebuilt_voice_config" : { "voice_name" : "Kore" }} }, } JavaScript const config = { responseModalities : [ Modality . AUDIO ], speechConfig : { voiceConfig : { prebuiltVoiceConfig : { voiceName : "Kore" } } } }; Note: If you're using the generateContent API, the set of available voices is slightly different. See the audio generation guide for generateContent audio generation voices. The Live API supports multiple languages . To change the language, set the language code within the speechConfig object as part of the session configuration: Python config = { "response_modalities" : [ "AUDIO" ], "speech_config" : { "language_code" : "de-DE" } } JavaScript const config = { responseModalities : [ Modality . AUDIO ], speechConfig : { languageCode : "de-DE" } }; Note: Native audio output models automatically choose the appropriate language and don't support explicitly setting the language code. Native audio capabilities The following capabilities are only available with native audio. You can learn more about native audio in Choose a model and audio generation . Note: Native audio models currently have limited tool use support. See Overview of supported tools for details. How to use native audio output To use native audio output, configure one of the native audio models and set response_modalities to AUDIO . See Send and receive audio for a full example. Python model = "gemini-2.5-flash-preview-native-audio-dialog" config = types . LiveConnectConfig ( response_modalities = [ "AUDIO" ]) async with client . aio . live . connect ( model = model , config = config ) as session : # Send audio input and receive audio JavaScript const model = 'gemini-2.5-flash-preview-native-audio-dialog' ; const config = { responseModalities : [ Modality . AUDIO ] }; async function main () { const session = await ai . live . connect ({ model : model , config : config , callbacks : ..., }); // Send audio input and receive audio session . close (); } main (); Affective dialog This feature lets Gemini adapt its response style to the input expression and tone. To use affective dialog, set the api version to v1alpha and set enable_affective_dialog to true in the setup message: Python client = genai . Client ( http_options = { "api_version" : "v1alpha" }) config = types . LiveConnectConfig ( response_modalities = [ "AUDIO" ], enable_affective_dialog = True ) JavaScript const ai = new GoogleGenAI ({ httpOptions : { "apiVersion" : "v1alpha" } }); const config = { responseModalities : [ Modality . AUDIO ], enableAffectiveDialog : true }; Note that affective dialog is currently only supported by the native audio output models. Proactive audio When this feature is enabled, Gemini can proactively decide not to respond if the content is not relevant. To use it, set the api version to v1alpha and configure the proactivity field in the setup message and set proactive_audio to true : Python client = genai . Client ( http_options = { "api_version" : "v1alpha" }) config = types . LiveConnectConfig ( response_modalities = [ "AUDIO" ], proactivity = { 'proactive_audio' : True } ) JavaScript const ai = new GoogleGenAI ({ httpOptions : { "apiVersion" : "v1alpha" } }); const config = { responseModalities : [ Modality . AUDIO ], proactivity : { proactiveAudio : true } } Note that proactive audio is currently only supported by the native audio output models. Native audio output with thinking Native audio output supports thinking capabilities , available via a separate model gemini-2.5-flash-exp-native-audio-thinking-dialog . See Send and receive audio for a full example. Python model = "gemini-2.5-flash-exp-native-audio-thinking-dialog" config = types . LiveConnectConfig ( response_modalities = [ "AUDIO" ]) async with client . aio . live . connect ( model = model , config = config ) as session : # Send audio input and receive audio JavaScript const model = 'gemini-2.5-flash-exp-native-audio-thinking-dialog' ; const config = { responseModalities : [ Modality . AUDIO ] }; async function main () { const session = await ai . live . connect ({ model : model , config : config , callbacks : ..., }); // Send audio input and receive audio session . close (); } main (); Voice Activity Detection (VAD) Voice Activity Detection (VAD) allows the model to recognize when a person is speaking. This is essential for creating natural conversations, as it allows a user to interrupt the model at any time. When VAD detects an interruption, the ongoing generation is canceled and discarded. Only the information already sent to the client is retained in the session history. The server then sends a BidiGenerateContentServerContent message to report the interruption. The Gemini server then discards any pending function calls and sends a BidiGenerateContentServerContent message with the IDs of the canceled calls. Python async for response in session . receive (): if response . server_content . interrupted is True : # The generation was interrupted # If realtime playback is implemented in your application, # you should stop playing audio and clear queued playback here. JavaScript const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . interrupted ) { // The generation was interrupted // If realtime playback is implemented in your application, // you should stop playing audio and clear queued playback here. } } Automatic VAD By default, the model automatically performs VAD on a continuous audio input stream. VAD can be configured with the realtimeInputConfig.automaticActivityDetection field of the setup configuration . When the audio stream is paused for more than a second (for example, because the user switched off the microphone), an audioStreamEnd event should be sent to flush any cached audio. The client can resume sending audio data at any time. Python # example audio file to try: # URL = "https://storage.googleapis.com/generativeai-downloads/data/hello_are_you_there.pcm" # !wget -q $URL -O sample.pcm import asyncio from pathlib import Path from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" config = { "response_modalities" : [ "TEXT" ]} async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : audio_bytes = Path ( "sample.pcm" ) . read_bytes () await session . send_realtime_input ( audio = types . Blob ( data = audio_bytes , mime_type = "audio/pcm;rate=16000" ) ) # if stream gets paused, send: # await session.send_realtime_input(audio_stream_end=True) async for response in session . receive (): if response . text is not None : print ( response . text ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript // example audio file to try: // URL = "https://storage.googleapis.com/generativeai-downloads/data/hello_are_you_there.pcm" // !wget -q $URL -O sample.pcm import { GoogleGenAI , Modality } from '@google/genai' ; import * as fs from "node:fs" ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const config = { responseModalities : [ Modality . TEXT ] }; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); // Send Audio Chunk const fileBuffer = fs . readFileSync ( "sample.pcm" ); const base64Audio = Buffer . from ( fileBuffer ). toString ( 'base64' ); session . sendRealtimeInput ( { audio : { data : base64Audio , mimeType : "audio/pcm;rate=16000" } } ); // if stream gets paused, send: // session.sendRealtimeInput({ audioStreamEnd: true }) const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . text ) { console . debug ( 'Received text: %s\n' , turn . text ); } else if ( turn . data ) { console . debug ( 'Received inline data: %s\n' , turn . data ); } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); With send_realtime_input , the API will respond to audio automatically based on VAD. While send_client_content adds messages to the model context in order, send_realtime_input is optimized for responsiveness at the expense of deterministic ordering. Automatic VAD configuration For more control over the VAD activity, you can configure the following parameters. See API reference for more info. Python from google.genai import types config = { "response_modalities" : [ "TEXT" ], "realtime_input_config" : { "automatic_activity_detection" : { "disabled" : False , # default "start_of_speech_sensitivity" : types . StartSensitivity . START_SENSITIVITY_LOW , "end_of_speech_sensitivity" : types . EndSensitivity . END_SENSITIVITY_LOW , "prefix_padding_ms" : 20 , "silence_duration_ms" : 100 , } } } JavaScript import { GoogleGenAI , Modality , StartSensitivity , EndSensitivity } from '@google/genai' ; const config = { responseModalities : [ Modality . TEXT ], realtimeInputConfig : { automaticActivityDetection : { disabled : false , // default startOfSpeechSensitivity : StartSensitivity . START_SENSITIVITY_LOW , endOfSpeechSensitivity : EndSensitivity . END_SENSITIVITY_LOW , prefixPaddingMs : 20 , silenceDurationMs : 100 , } } }; Disable automatic VAD Alternatively, the automatic VAD can be disabled by setting realtimeInputConfig.automaticActivityDetection.disabled to true in the setup message. In this configuration the client is responsible for detecting user speech and sending activityStart and activityEnd messages at the appropriate times. An audioStreamEnd isn't sent in this configuration. Instead, any interruption of the stream is marked by an activityEnd message. Python config = { "response_modalities" : [ "TEXT" ], "realtime_input_config" : { "automatic_activity_detection" : { "disabled" : True }}, } async with client . aio . live . connect ( model = model , config = config ) as session : # ... await session . send_realtime_input ( activity_start = types . ActivityStart ()) await session . send_realtime_input ( audio = types . Blob ( data = audio_bytes , mime_type = "audio/pcm;rate=16000" ) ) await session . send_realtime_input ( activity_end = types . ActivityEnd ()) # ... JavaScript const config = { responseModalities : [ Modality . TEXT ], realtimeInputConfig : { automaticActivityDetection : { disabled : true , } } }; session . sendRealtimeInput ({ activityStart : {} }) session . sendRealtimeInput ( { audio : { data : base64Audio , mimeType : "audio/pcm;rate=16000" } } ); session . sendRealtimeInput ({ activityEnd : {} }) Token count You can find the total number of consumed tokens in the usageMetadata field of the returned server message. Python async for message in session . receive (): # The server will periodically send messages that include UsageMetadata. if message . usage_metadata : usage = message . usage_metadata print ( f "Used { usage . total_token_count } tokens in total. Response token breakdown:" ) for detail in usage . response_tokens_details : match detail : case types . ModalityTokenCount ( modality = modality , token_count = count ): print ( f " { modality } : { count } " ) JavaScript const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . usageMetadata ) { console . debug ( 'Used %s tokens in total. Response token breakdown:\n' , turn . usageMetadata . totalTokenCount ); for ( const detail of turn . usageMetadata . responseTokensDetails ) { console . debug ( '%s\n' , detail ); } } } Media resolution You can specify the media resolution for the input media by setting the mediaResolution field as part of the session configuration: Python from google.genai import types config = { "response_modalities" : [ "AUDIO" ], "media_resolution" : types . MediaResolution . MEDIA_RESOLUTION_LOW , } JavaScript import { GoogleGenAI , Modality , MediaResolution } from '@google/genai' ; const config = { responseModalities : [ Modality . TEXT ], mediaResolution : MediaResolution . MEDIA_RESOLUTION_LOW , }; Limitations Consider the following limitations of the Live API when you plan your project. Response modalities You can only set one response modality ( TEXT or AUDIO ) per session in the session configuration. Setting both results in a config error message. This means that you can configure the model to respond with either text or audio, but not both in the same session. Client authentication The Live API only provides server-to-server authentication by default. If you're implementing your Live API application using a client-to-server approach , you need to use ephemeral tokens to mitigate security risks. Session duration Audio-only sessions are limited to 15 minutes, and audio plus video sessions are limited to 2 minutes. However, you can configure different session management techniques for unlimited extensions on session duration. Context window A session has a context window limit of: 128k tokens for native audio output models 32k tokens for other Live API models Supported languages Live API supports the following languages. Note: Native audio output models automatically choose the appropriate language and don't support explicitly setting the language code. Language BCP-47 Code Language BCP-47 Code German (Germany) de-DE English (Australia)* en-AU English (UK)* en-GB English (India) en-IN English (US) en-US Spanish (US) es-US French (France) fr-FR Hindi (India) hi-IN Portuguese (Brazil) pt-BR Arabic (Generic) ar-XA Spanish (Spain)* es-ES French (Canada)* fr-CA Indonesian (Indonesia) id-ID Italian (Italy) it-IT Japanese (Japan) ja-JP Turkish (Turkey) tr-TR Vietnamese (Vietnam) vi-VN Bengali (India) bn-IN Gujarati (India)* gu-IN Kannada (India)* kn-IN Marathi (India) mr-IN Malayalam (India)* ml-IN Tamil (India) ta-IN Telugu (India) te-IN Dutch (Netherlands) nl-NL Korean (South Korea) ko-KR Mandarin Chinese (China)* cmn-CN Polish (Poland) pl-PL Russian (Russia) ru-RU Thai (Thailand) th-TH Languages marked with an asterisk (*) are not available for Native audio . What's next Read the Tool Use and Session Management guides for essential information on using the Live API effectively. Try the Live API in Google AI Studio . For more info about the Live API models, see Gemini 2.0 Flash Live and Gemini 2.5 Flash Native Audio on the Models page. Try more examples in the Live API cookbook , the Live API Tools cookbook , and the Live API Get Started script . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-09-10 UTC.
text_content/docs_live-session_21ef6dd5.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/live-session#session-resumption
2
  Title: Session management with Live API  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Session management with Live API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Session management with Live API In the Live API, a session refers to a persistent connection where input and output are streamed continuously over the same connection (read more about how it works ). This unique session design enables low latency and supports unique features, but can also introduce challenges, like session time limits, and early termination. This guide covers strategies for overcoming the session management challenges that can arise when using the Live API. Session lifetime Without compression, audio-only sessions are limited to 15 minutes, and audio-video sessions are limited to 2 minutes. Exceeding these limits will terminate the session (and therefore, the connection), but you can use context window compression to extend sessions to an unlimited amount of time. The lifetime of a connection is limited as well, to around 10 minutes. When the connection terminates, the session terminates as well. In this case, you can configure a single session to stay active over multiple connections using session resumption . You'll also receive a GoAway message before the connection ends, allowing you to take further actions. Context window compression To enable longer sessions, and avoid abrupt connection termination, you can enable context window compression by setting the contextWindowCompression field as part of the session configuration. In the ContextWindowCompressionConfig , you can configure a sliding-window mechanism and the number of tokens that triggers compression. Python from google.genai import types config = types . LiveConnectConfig ( response_modalities = [ "AUDIO" ], context_window_compression = ( # Configures compression with default parameters. types . ContextWindowCompressionConfig ( sliding_window = types . SlidingWindow (), ) ), ) JavaScript const config = { responseModalities : [ Modality . AUDIO ], contextWindowCompression : { slidingWindow : {} } }; Session resumption To prevent session termination when the server periodically resets the WebSocket connection, configure the sessionResumption field within the setup configuration . Passing this configuration causes the server to send SessionResumptionUpdate messages, which can be used to resume the session by passing the last resumption token as the SessionResumptionConfig.handle of the subsequent connection. Python import asyncio from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" async def main (): print ( f "Connecting to the service with handle { previous_session_handle } ..." ) async with client . aio . live . connect ( model = model , config = types . LiveConnectConfig ( response_modalities = [ "AUDIO" ], session_resumption = types . SessionResumptionConfig ( # The handle of the session to resume is passed here, # or else None to start a new session. handle = previous_session_handle ), ), ) as session : while True : await session . send_client_content ( turns = types . Content ( role = "user" , parts = [ types . Part ( text = "Hello world!" )] ) ) async for message in session . receive (): # Periodically, the server will send update messages that may # contain a handle for the current state of the session. if message . session_resumption_update : update = message . session_resumption_update if update . resumable and update . new_handle : # The handle should be retained and linked to the session. return update . new_handle # For the purposes of this example, placeholder input is continually fed # to the model. In non-sample code, the model inputs would come from # the user. if message . server_content and message . server_content . turn_complete : break if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } console . debug ( 'Connecting to the service with handle %s...' , previousSessionHandle ) const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : { responseModalities : [ Modality . TEXT ], sessionResumption : { handle : previousSessionHandle } // The handle of the session to resume is passed here, or else null to start a new session. } }); const inputTurns = 'Hello how are you?' ; session . sendClientContent ({ turns : inputTurns }); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . sessionResumptionUpdate ) { if ( turn . sessionResumptionUpdate . resumable && turn . sessionResumptionUpdate . newHandle ) { let newHandle = turn . sessionResumptionUpdate . newHandle // ...Store newHandle and start new session with this handle here } } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); Receiving a message before the session disconnects The server sends a GoAway message that signals that the current connection will soon be terminated. This message includes the timeLeft , indicating the remaining time and lets you take further action before the connection will be terminated as ABORTED. Python async for response in session . receive (): if response . go_away is not None : # The connection will soon be terminated print ( response . go_away . time_left ) JavaScript const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . goAway ) { console . debug ( 'Time left: %s\n' , turn . goAway . timeLeft ); } } Receiving a message when the generation is complete The server sends a generationComplete message that signals that the model finished generating the response. Python async for response in session . receive (): if response . server_content . generation_complete is True : # The generation is complete JavaScript const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . generationComplete ) { // The generation is complete } } What's next Explore more ways to work with the Live API in the full Capabilities guide, the Tool use page, or the Live API cookbook . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-08 UTC.
 
2
  Title: Session management with Live API  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Session management with Live API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Session management with Live API In the Live API, a session refers to a persistent connection where input and output are streamed continuously over the same connection (read more about how it works ). This unique session design enables low latency and supports unique features, but can also introduce challenges, like session time limits, and early termination. This guide covers strategies for overcoming the session management challenges that can arise when using the Live API. Session lifetime Without compression, audio-only sessions are limited to 15 minutes, and audio-video sessions are limited to 2 minutes. Exceeding these limits will terminate the session (and therefore, the connection), but you can use context window compression to extend sessions to an unlimited amount of time. The lifetime of a connection is limited as well, to around 10 minutes. When the connection terminates, the session terminates as well. In this case, you can configure a single session to stay active over multiple connections using session resumption . You'll also receive a GoAway message before the connection ends, allowing you to take further actions. Context window compression To enable longer sessions, and avoid abrupt connection termination, you can enable context window compression by setting the contextWindowCompression field as part of the session configuration. In the ContextWindowCompressionConfig , you can configure a sliding-window mechanism and the number of tokens that triggers compression. Python from google.genai import types config = types . LiveConnectConfig ( response_modalities = [ "AUDIO" ], context_window_compression = ( # Configures compression with default parameters. types . ContextWindowCompressionConfig ( sliding_window = types . SlidingWindow (), ) ), ) JavaScript const config = { responseModalities : [ Modality . AUDIO ], contextWindowCompression : { slidingWindow : {} } }; Session resumption To prevent session termination when the server periodically resets the WebSocket connection, configure the sessionResumption field within the setup configuration . Passing this configuration causes the server to send SessionResumptionUpdate messages, which can be used to resume the session by passing the last resumption token as the SessionResumptionConfig.handle of the subsequent connection. Resumption tokens are valid for 2 hr after the last sessions termination. Python import asyncio from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" async def main (): print ( f "Connecting to the service with handle { previous_session_handle } ..." ) async with client . aio . live . connect ( model = model , config = types . LiveConnectConfig ( response_modalities = [ "AUDIO" ], session_resumption = types . SessionResumptionConfig ( # The handle of the session to resume is passed here, # or else None to start a new session. handle = previous_session_handle ), ), ) as session : while True : await session . send_client_content ( turns = types . Content ( role = "user" , parts = [ types . Part ( text = "Hello world!" )] ) ) async for message in session . receive (): # Periodically, the server will send update messages that may # contain a handle for the current state of the session. if message . session_resumption_update : update = message . session_resumption_update if update . resumable and update . new_handle : # The handle should be retained and linked to the session. return update . new_handle # For the purposes of this example, placeholder input is continually fed # to the model. In non-sample code, the model inputs would come from # the user. if message . server_content and message . server_content . turn_complete : break if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } console . debug ( 'Connecting to the service with handle %s...' , previousSessionHandle ) const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : { responseModalities : [ Modality . TEXT ], sessionResumption : { handle : previousSessionHandle } // The handle of the session to resume is passed here, or else null to start a new session. } }); const inputTurns = 'Hello how are you?' ; session . sendClientContent ({ turns : inputTurns }); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . sessionResumptionUpdate ) { if ( turn . sessionResumptionUpdate . resumable && turn . sessionResumptionUpdate . newHandle ) { let newHandle = turn . sessionResumptionUpdate . newHandle // ...Store newHandle and start new session with this handle here } } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); Receiving a message before the session disconnects The server sends a GoAway message that signals that the current connection will soon be terminated. This message includes the timeLeft , indicating the remaining time and lets you take further action before the connection will be terminated as ABORTED. Python async for response in session . receive (): if response . go_away is not None : # The connection will soon be terminated print ( response . go_away . time_left ) JavaScript const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . goAway ) { console . debug ( 'Time left: %s\n' , turn . goAway . timeLeft ); } } Receiving a message when the generation is complete The server sends a generationComplete message that signals that the model finished generating the response. Python async for response in session . receive (): if response . server_content . generation_complete is True : # The generation is complete JavaScript const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . generationComplete ) { // The generation is complete } } What's next Explore more ways to work with the Live API in the full Capabilities guide, the Tool use page, or the Live API cookbook . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_live-session_910f1837.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/live-session#goaway-message
2
  Title: Session management with Live API  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Session management with Live API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Session management with Live API In the Live API, a session refers to a persistent connection where input and output are streamed continuously over the same connection (read more about how it works ). This unique session design enables low latency and supports unique features, but can also introduce challenges, like session time limits, and early termination. This guide covers strategies for overcoming the session management challenges that can arise when using the Live API. Session lifetime Without compression, audio-only sessions are limited to 15 minutes, and audio-video sessions are limited to 2 minutes. Exceeding these limits will terminate the session (and therefore, the connection), but you can use context window compression to extend sessions to an unlimited amount of time. The lifetime of a connection is limited as well, to around 10 minutes. When the connection terminates, the session terminates as well. In this case, you can configure a single session to stay active over multiple connections using session resumption . You'll also receive a GoAway message before the connection ends, allowing you to take further actions. Context window compression To enable longer sessions, and avoid abrupt connection termination, you can enable context window compression by setting the contextWindowCompression field as part of the session configuration. In the ContextWindowCompressionConfig , you can configure a sliding-window mechanism and the number of tokens that triggers compression. Python from google.genai import types config = types . LiveConnectConfig ( response_modalities = [ "AUDIO" ], context_window_compression = ( # Configures compression with default parameters. types . ContextWindowCompressionConfig ( sliding_window = types . SlidingWindow (), ) ), ) JavaScript const config = { responseModalities : [ Modality . AUDIO ], contextWindowCompression : { slidingWindow : {} } }; Session resumption To prevent session termination when the server periodically resets the WebSocket connection, configure the sessionResumption field within the setup configuration . Passing this configuration causes the server to send SessionResumptionUpdate messages, which can be used to resume the session by passing the last resumption token as the SessionResumptionConfig.handle of the subsequent connection. Python import asyncio from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" async def main (): print ( f "Connecting to the service with handle { previous_session_handle } ..." ) async with client . aio . live . connect ( model = model , config = types . LiveConnectConfig ( response_modalities = [ "AUDIO" ], session_resumption = types . SessionResumptionConfig ( # The handle of the session to resume is passed here, # or else None to start a new session. handle = previous_session_handle ), ), ) as session : while True : await session . send_client_content ( turns = types . Content ( role = "user" , parts = [ types . Part ( text = "Hello world!" )] ) ) async for message in session . receive (): # Periodically, the server will send update messages that may # contain a handle for the current state of the session. if message . session_resumption_update : update = message . session_resumption_update if update . resumable and update . new_handle : # The handle should be retained and linked to the session. return update . new_handle # For the purposes of this example, placeholder input is continually fed # to the model. In non-sample code, the model inputs would come from # the user. if message . server_content and message . server_content . turn_complete : break if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } console . debug ( 'Connecting to the service with handle %s...' , previousSessionHandle ) const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : { responseModalities : [ Modality . TEXT ], sessionResumption : { handle : previousSessionHandle } // The handle of the session to resume is passed here, or else null to start a new session. } }); const inputTurns = 'Hello how are you?' ; session . sendClientContent ({ turns : inputTurns }); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . sessionResumptionUpdate ) { if ( turn . sessionResumptionUpdate . resumable && turn . sessionResumptionUpdate . newHandle ) { let newHandle = turn . sessionResumptionUpdate . newHandle // ...Store newHandle and start new session with this handle here } } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); Receiving a message before the session disconnects The server sends a GoAway message that signals that the current connection will soon be terminated. This message includes the timeLeft , indicating the remaining time and lets you take further action before the connection will be terminated as ABORTED. Python async for response in session . receive (): if response . go_away is not None : # The connection will soon be terminated print ( response . go_away . time_left ) JavaScript const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . goAway ) { console . debug ( 'Time left: %s\n' , turn . goAway . timeLeft ); } } Receiving a message when the generation is complete The server sends a generationComplete message that signals that the model finished generating the response. Python async for response in session . receive (): if response . server_content . generation_complete is True : # The generation is complete JavaScript const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . generationComplete ) { // The generation is complete } } What's next Explore more ways to work with the Live API in the full Capabilities guide, the Tool use page, or the Live API cookbook . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-08 UTC.
 
2
  Title: Session management with Live API  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Session management with Live API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Session management with Live API In the Live API, a session refers to a persistent connection where input and output are streamed continuously over the same connection (read more about how it works ). This unique session design enables low latency and supports unique features, but can also introduce challenges, like session time limits, and early termination. This guide covers strategies for overcoming the session management challenges that can arise when using the Live API. Session lifetime Without compression, audio-only sessions are limited to 15 minutes, and audio-video sessions are limited to 2 minutes. Exceeding these limits will terminate the session (and therefore, the connection), but you can use context window compression to extend sessions to an unlimited amount of time. The lifetime of a connection is limited as well, to around 10 minutes. When the connection terminates, the session terminates as well. In this case, you can configure a single session to stay active over multiple connections using session resumption . You'll also receive a GoAway message before the connection ends, allowing you to take further actions. Context window compression To enable longer sessions, and avoid abrupt connection termination, you can enable context window compression by setting the contextWindowCompression field as part of the session configuration. In the ContextWindowCompressionConfig , you can configure a sliding-window mechanism and the number of tokens that triggers compression. Python from google.genai import types config = types . LiveConnectConfig ( response_modalities = [ "AUDIO" ], context_window_compression = ( # Configures compression with default parameters. types . ContextWindowCompressionConfig ( sliding_window = types . SlidingWindow (), ) ), ) JavaScript const config = { responseModalities : [ Modality . AUDIO ], contextWindowCompression : { slidingWindow : {} } }; Session resumption To prevent session termination when the server periodically resets the WebSocket connection, configure the sessionResumption field within the setup configuration . Passing this configuration causes the server to send SessionResumptionUpdate messages, which can be used to resume the session by passing the last resumption token as the SessionResumptionConfig.handle of the subsequent connection. Resumption tokens are valid for 2 hr after the last sessions termination. Python import asyncio from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" async def main (): print ( f "Connecting to the service with handle { previous_session_handle } ..." ) async with client . aio . live . connect ( model = model , config = types . LiveConnectConfig ( response_modalities = [ "AUDIO" ], session_resumption = types . SessionResumptionConfig ( # The handle of the session to resume is passed here, # or else None to start a new session. handle = previous_session_handle ), ), ) as session : while True : await session . send_client_content ( turns = types . Content ( role = "user" , parts = [ types . Part ( text = "Hello world!" )] ) ) async for message in session . receive (): # Periodically, the server will send update messages that may # contain a handle for the current state of the session. if message . session_resumption_update : update = message . session_resumption_update if update . resumable and update . new_handle : # The handle should be retained and linked to the session. return update . new_handle # For the purposes of this example, placeholder input is continually fed # to the model. In non-sample code, the model inputs would come from # the user. if message . server_content and message . server_content . turn_complete : break if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } console . debug ( 'Connecting to the service with handle %s...' , previousSessionHandle ) const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : { responseModalities : [ Modality . TEXT ], sessionResumption : { handle : previousSessionHandle } // The handle of the session to resume is passed here, or else null to start a new session. } }); const inputTurns = 'Hello how are you?' ; session . sendClientContent ({ turns : inputTurns }); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . sessionResumptionUpdate ) { if ( turn . sessionResumptionUpdate . resumable && turn . sessionResumptionUpdate . newHandle ) { let newHandle = turn . sessionResumptionUpdate . newHandle // ...Store newHandle and start new session with this handle here } } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); Receiving a message before the session disconnects The server sends a GoAway message that signals that the current connection will soon be terminated. This message includes the timeLeft , indicating the remaining time and lets you take further action before the connection will be terminated as ABORTED. Python async for response in session . receive (): if response . go_away is not None : # The connection will soon be terminated print ( response . go_away . time_left ) JavaScript const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . goAway ) { console . debug ( 'Time left: %s\n' , turn . goAway . timeLeft ); } } Receiving a message when the generation is complete The server sends a generationComplete message that signals that the model finished generating the response. Python async for response in session . receive (): if response . server_content . generation_complete is True : # The generation is complete JavaScript const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . generationComplete ) { // The generation is complete } } What's next Explore more ways to work with the Live API in the full Capabilities guide, the Tool use page, or the Live API cookbook . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_live-tools_141b2194.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/live-tools
2
  Title: Tool use with Live API  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Tool use with Live API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Tool use with Live API Tool use allows Live API to go beyond just conversation by enabling it to perform actions in the real-world and pull in external context while maintaining a real time connection. You can define tools such as Function calling , Code execution , and Google Search with the Live API. Overview of supported tools Here's a brief overview of the available tools for each model: Tool Cascaded models gemini-live-2.5-flash-preview gemini-2.0-flash-live-001 gemini-2.5-flash-preview-native-audio-dialog gemini-2.5-flash-exp-native-audio-thinking-dialog Search Yes Yes Yes Function calling Yes Yes No Code execution Yes No No Url context Yes No No Function calling Live API supports function calling, just like regular content generation requests. Function calling lets the Live API interact with external data and programs, greatly increasing what your applications can accomplish. You can define function declarations as part of the session configuration. After receiving tool calls, the client should respond with a list of FunctionResponse objects using the session.send_tool_response method. See the Function calling tutorial to learn more. Note: Unlike the generateContent API, the Live API doesn't support automatic tool response handling. You must handle tool responses manually in your client code. Python import asyncio from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" # Simple function definitions turn_on_the_lights = { "name" : "turn_on_the_lights" } turn_off_the_lights = { "name" : "turn_off_the_lights" } tools = [{ "function_declarations" : [ turn_on_the_lights , turn_off_the_lights ]}] config = { "response_modalities" : [ "TEXT" ], "tools" : tools } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : prompt = "Turn on the lights please" await session . send_client_content ( turns = { "parts" : [{ "text" : prompt }]}) async for chunk in session . receive (): if chunk . server_content : if chunk . text is not None : print ( chunk . text ) elif chunk . tool_call : function_responses = [] for fc in chunk . tool_call . function_calls : function_response = types . FunctionResponse ( id = fc . id , name = fc . name , response = { "result" : "ok" } # simple, hard-coded function response ) function_responses . append ( function_response ) await session . send_tool_response ( function_responses = function_responses ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; // Simple function definitions const turn_on_the_lights = { name : "turn_on_the_lights" } // , description: '...', parameters: { ... } const turn_off_the_lights = { name : "turn_off_the_lights" } const tools = [{ functionDeclarations : [ turn_on_the_lights , turn_off_the_lights ] }] const config = { responseModalities : [ Modality . TEXT ], tools : tools } async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } else if ( message . toolCall ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'Turn on the lights please' ; session . sendClientContent ({ turns : inputTurns }); let turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . modelTurn && turn . serverContent . modelTurn . parts ) { for ( const part of turn . serverContent . modelTurn . parts ) { if ( part . text ) { console . debug ( 'Received text: %s\n' , part . text ); } } } else if ( turn . toolCall ) { const functionResponses = []; for ( const fc of turn . toolCall . functionCalls ) { functionResponses . push ({ id : fc . id , name : fc . name , response : { result : "ok" } // simple, hard-coded function response }); } console . debug ( 'Sending tool response...\n' ); session . sendToolResponse ({ functionResponses : functionResponses }); } } // Check again for new messages turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . modelTurn && turn . serverContent . modelTurn . parts ) { for ( const part of turn . serverContent . modelTurn . parts ) { if ( part . text ) { console . debug ( 'Received text: %s\n' , part . text ); } } } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); From a single prompt, the model can generate multiple function calls and the code necessary to chain their outputs. This code executes in a sandbox environment, generating subsequent BidiGenerateContentToolCall messages. Asynchronous function calling Note: Asynchronous function calling is only supported in half-cascade audio generation. Function calling executes sequentially by default, meaning execution pauses until the results of each function call are available. This ensures sequential processing, which means you won't be able to continue interacting with the model while the functions are being run. If you don't want to block the conversation, you can tell the model to run the functions asynchronously. To do so, you first need to add a behavior to the function definitions: Python # Non-blocking function definitions turn_on_the_lights = { "name" : "turn_on_the_lights" , "behavior" : "NON_BLOCKING" } # turn_on_the_lights will run asynchronously turn_off_the_lights = { "name" : "turn_off_the_lights" } # turn_off_the_lights will still pause all interactions with the model JavaScript import { GoogleGenAI , Modality , Behavior } from '@google/genai' ; // Non-blocking function definitions const turn_on_the_lights = { name : "turn_on_the_lights" , behavior : Behavior . NON_BLOCKING } // Blocking function definitions const turn_off_the_lights = { name : "turn_off_the_lights" } const tools = [{ functionDeclarations : [ turn_on_the_lights , turn_off_the_lights ] }] NON-BLOCKING ensures the function runs asynchronously while you can continue interacting with the model. Then you need to tell the model how to behave when it receives the FunctionResponse using the scheduling parameter. It can either: Interrupt what it's doing and tell you about the response it got right away ( scheduling="INTERRUPT" ), Wait until it's finished with what it's currently doing ( scheduling="WHEN_IDLE" ), Or do nothing and use that knowledge later on in the discussion ( scheduling="SILENT" ) Python # for a non-blocking function definition, apply scheduling in the function response: function_response = types . FunctionResponse ( id = fc . id , name = fc . name , response = { "result" : "ok" , "scheduling" : "INTERRUPT" # Can also be WHEN_IDLE or SILENT } ) JavaScript import { GoogleGenAI , Modality , Behavior , FunctionResponseScheduling } from '@google/genai' ; // for a non-blocking function definition, apply scheduling in the function response: const functionResponse = { id : fc . id , name : fc . name , response : { result : "ok" , scheduling : FunctionResponseScheduling . INTERRUPT // Can also be WHEN_IDLE or SILENT } } Code execution You can define code execution as part of the session configuration. This lets the Live API generate and execute Python code and dynamically perform computations to benefit your results. See the Code execution tutorial to learn more. Python import asyncio from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" tools = [{ 'code_execution' : {}}] config = { "response_modalities" : [ "TEXT" ], "tools" : tools } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : prompt = "Compute the largest prime palindrome under 100000." await session . send_client_content ( turns = { "parts" : [{ "text" : prompt }]}) async for chunk in session . receive (): if chunk . server_content : if chunk . text is not None : print ( chunk . text ) model_turn = chunk . server_content . model_turn if model_turn : for part in model_turn . parts : if part . executable_code is not None : print ( part . executable_code . code ) if part . code_execution_result is not None : print ( part . code_execution_result . output ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const tools = [{ codeExecution : {}}] const config = { responseModalities : [ Modality . TEXT ], tools : tools } async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } else if ( message . toolCall ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'Compute the largest prime palindrome under 100000.' ; session . sendClientContent ({ turns : inputTurns }); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . modelTurn && turn . serverContent . modelTurn . parts ) { for ( const part of turn . serverContent . modelTurn . parts ) { if ( part . text ) { console . debug ( 'Received text: %s\n' , part . text ); } else if ( part . executableCode ) { console . debug ( 'executableCode: %s\n' , part . executableCode . code ); } else if ( part . codeExecutionResult ) { console . debug ( 'codeExecutionResult: %s\n' , part . codeExecutionResult . output ); } } } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); Grounding with Google Search You can enable Grounding with Google Search as part of the session configuration. This increases the Live API's accuracy and prevents hallucinations. See the Grounding tutorial to learn more. Python import asyncio from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" tools = [{ 'google_search' : {}}] config = { "response_modalities" : [ "TEXT" ], "tools" : tools } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : prompt = "When did the last Brazil vs. Argentina soccer match happen?" await session . send_client_content ( turns = { "parts" : [{ "text" : prompt }]}) async for chunk in session . receive (): if chunk . server_content : if chunk . text is not None : print ( chunk . text ) # The model might generate and execute Python code to use Search model_turn = chunk . server_content . model_turn if model_turn : for part in model_turn . parts : if part . executable_code is not None : print ( part . executable_code . code ) if part . code_execution_result is not None : print ( part . code_execution_result . output ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const tools = [{ googleSearch : {}}] const config = { responseModalities : [ Modality . TEXT ], tools : tools } async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } else if ( message . toolCall ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'When did the last Brazil vs. Argentina soccer match happen?' ; session . sendClientContent ({ turns : inputTurns }); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . modelTurn && turn . serverContent . modelTurn . parts ) { for ( const part of turn . serverContent . modelTurn . parts ) { if ( part . text ) { console . debug ( 'Received text: %s\n' , part . text ); } else if ( part . executableCode ) { console . debug ( 'executableCode: %s\n' , part . executableCode . code ); } else if ( part . codeExecutionResult ) { console . debug ( 'codeExecutionResult: %s\n' , part . codeExecutionResult . output ); } } } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); Combining multiple tools You can combine multiple tools within the Live API, increasing your application's capabilities even more: Python prompt = """ Hey, I need you to do three things for me. 1. Compute the largest prime palindrome under 100000. 2. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024? 3. Turn on the lights Thanks! """ tools = [ { "google_search" : {}}, { "code_execution" : {}}, { "function_declarations" : [ turn_on_the_lights , turn_off_the_lights ]}, ] config = { "response_modalities" : [ "TEXT" ], "tools" : tools } # ... remaining model call JavaScript const prompt = `Hey, I need you to do three things for me. 1. Compute the largest prime palindrome under 100000. 2. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024? 3. Turn on the lights Thanks! ` const tools = [ { googleSearch : {} }, { codeExecution : {} }, { functionDeclarations : [ turn_on_the_lights , turn_off_the_lights ] } ] const config = { responseModalities : [ Modality . TEXT ], tools : tools } // ... remaining model call What's next Check out more examples of using tools with the Live API in the Tool use cookbook . Get the full story on features and configurations from the Live API Capabilities guide . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-08 UTC.
 
2
  Title: Tool use with Live API  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Tool use with Live API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Tool use with Live API Tool use allows Live API to go beyond just conversation by enabling it to perform actions in the real-world and pull in external context while maintaining a real time connection. You can define tools such as Function calling , Code execution , and Google Search with the Live API. Overview of supported tools Here's a brief overview of the available tools for each model: Tool Cascaded models gemini-live-2.5-flash-preview gemini-2.0-flash-live-001 gemini-2.5-flash-preview-native-audio-dialog gemini-2.5-flash-exp-native-audio-thinking-dialog Search Yes Yes Yes Function calling Yes Yes No Code execution Yes No No URL context Yes No No Function calling Live API supports function calling, just like regular content generation requests. Function calling lets the Live API interact with external data and programs, greatly increasing what your applications can accomplish. You can define function declarations as part of the session configuration. After receiving tool calls, the client should respond with a list of FunctionResponse objects using the session.send_tool_response method. See the Function calling tutorial to learn more. Note: Unlike the generateContent API, the Live API doesn't support automatic tool response handling. You must handle tool responses manually in your client code. Python import asyncio from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" # Simple function definitions turn_on_the_lights = { "name" : "turn_on_the_lights" } turn_off_the_lights = { "name" : "turn_off_the_lights" } tools = [{ "function_declarations" : [ turn_on_the_lights , turn_off_the_lights ]}] config = { "response_modalities" : [ "TEXT" ], "tools" : tools } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : prompt = "Turn on the lights please" await session . send_client_content ( turns = { "parts" : [{ "text" : prompt }]}) async for chunk in session . receive (): if chunk . server_content : if chunk . text is not None : print ( chunk . text ) elif chunk . tool_call : function_responses = [] for fc in chunk . tool_call . function_calls : function_response = types . FunctionResponse ( id = fc . id , name = fc . name , response = { "result" : "ok" } # simple, hard-coded function response ) function_responses . append ( function_response ) await session . send_tool_response ( function_responses = function_responses ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; // Simple function definitions const turn_on_the_lights = { name : "turn_on_the_lights" } // , description: '...', parameters: { ... } const turn_off_the_lights = { name : "turn_off_the_lights" } const tools = [{ functionDeclarations : [ turn_on_the_lights , turn_off_the_lights ] }] const config = { responseModalities : [ Modality . TEXT ], tools : tools } async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } else if ( message . toolCall ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'Turn on the lights please' ; session . sendClientContent ({ turns : inputTurns }); let turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . modelTurn && turn . serverContent . modelTurn . parts ) { for ( const part of turn . serverContent . modelTurn . parts ) { if ( part . text ) { console . debug ( 'Received text: %s\n' , part . text ); } } } else if ( turn . toolCall ) { const functionResponses = []; for ( const fc of turn . toolCall . functionCalls ) { functionResponses . push ({ id : fc . id , name : fc . name , response : { result : "ok" } // simple, hard-coded function response }); } console . debug ( 'Sending tool response...\n' ); session . sendToolResponse ({ functionResponses : functionResponses }); } } // Check again for new messages turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . modelTurn && turn . serverContent . modelTurn . parts ) { for ( const part of turn . serverContent . modelTurn . parts ) { if ( part . text ) { console . debug ( 'Received text: %s\n' , part . text ); } } } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); From a single prompt, the model can generate multiple function calls and the code necessary to chain their outputs. This code executes in a sandbox environment, generating subsequent BidiGenerateContentToolCall messages. Asynchronous function calling Note: Asynchronous function calling is only supported in half-cascade audio generation. Function calling executes sequentially by default, meaning execution pauses until the results of each function call are available. This ensures sequential processing, which means you won't be able to continue interacting with the model while the functions are being run. If you don't want to block the conversation, you can tell the model to run the functions asynchronously. To do so, you first need to add a behavior to the function definitions: Python # Non-blocking function definitions turn_on_the_lights = { "name" : "turn_on_the_lights" , "behavior" : "NON_BLOCKING" } # turn_on_the_lights will run asynchronously turn_off_the_lights = { "name" : "turn_off_the_lights" } # turn_off_the_lights will still pause all interactions with the model JavaScript import { GoogleGenAI , Modality , Behavior } from '@google/genai' ; // Non-blocking function definitions const turn_on_the_lights = { name : "turn_on_the_lights" , behavior : Behavior . NON_BLOCKING } // Blocking function definitions const turn_off_the_lights = { name : "turn_off_the_lights" } const tools = [{ functionDeclarations : [ turn_on_the_lights , turn_off_the_lights ] }] NON-BLOCKING ensures the function runs asynchronously while you can continue interacting with the model. Then you need to tell the model how to behave when it receives the FunctionResponse using the scheduling parameter. It can either: Interrupt what it's doing and tell you about the response it got right away ( scheduling="INTERRUPT" ), Wait until it's finished with what it's currently doing ( scheduling="WHEN_IDLE" ), Or do nothing and use that knowledge later on in the discussion ( scheduling="SILENT" ) Python # for a non-blocking function definition, apply scheduling in the function response: function_response = types . FunctionResponse ( id = fc . id , name = fc . name , response = { "result" : "ok" , "scheduling" : "INTERRUPT" # Can also be WHEN_IDLE or SILENT } ) JavaScript import { GoogleGenAI , Modality , Behavior , FunctionResponseScheduling } from '@google/genai' ; // for a non-blocking function definition, apply scheduling in the function response: const functionResponse = { id : fc . id , name : fc . name , response : { result : "ok" , scheduling : FunctionResponseScheduling . INTERRUPT // Can also be WHEN_IDLE or SILENT } } Code execution You can define code execution as part of the session configuration. This lets the Live API generate and execute Python code and dynamically perform computations to benefit your results. See the Code execution tutorial to learn more. Python import asyncio from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" tools = [{ 'code_execution' : {}}] config = { "response_modalities" : [ "TEXT" ], "tools" : tools } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : prompt = "Compute the largest prime palindrome under 100000." await session . send_client_content ( turns = { "parts" : [{ "text" : prompt }]}) async for chunk in session . receive (): if chunk . server_content : if chunk . text is not None : print ( chunk . text ) model_turn = chunk . server_content . model_turn if model_turn : for part in model_turn . parts : if part . executable_code is not None : print ( part . executable_code . code ) if part . code_execution_result is not None : print ( part . code_execution_result . output ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const tools = [{ codeExecution : {}}] const config = { responseModalities : [ Modality . TEXT ], tools : tools } async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } else if ( message . toolCall ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'Compute the largest prime palindrome under 100000.' ; session . sendClientContent ({ turns : inputTurns }); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . modelTurn && turn . serverContent . modelTurn . parts ) { for ( const part of turn . serverContent . modelTurn . parts ) { if ( part . text ) { console . debug ( 'Received text: %s\n' , part . text ); } else if ( part . executableCode ) { console . debug ( 'executableCode: %s\n' , part . executableCode . code ); } else if ( part . codeExecutionResult ) { console . debug ( 'codeExecutionResult: %s\n' , part . codeExecutionResult . output ); } } } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); Grounding with Google Search You can enable Grounding with Google Search as part of the session configuration. This increases the Live API's accuracy and prevents hallucinations. See the Grounding tutorial to learn more. Python import asyncio from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" tools = [{ 'google_search' : {}}] config = { "response_modalities" : [ "TEXT" ], "tools" : tools } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : prompt = "When did the last Brazil vs. Argentina soccer match happen?" await session . send_client_content ( turns = { "parts" : [{ "text" : prompt }]}) async for chunk in session . receive (): if chunk . server_content : if chunk . text is not None : print ( chunk . text ) # The model might generate and execute Python code to use Search model_turn = chunk . server_content . model_turn if model_turn : for part in model_turn . parts : if part . executable_code is not None : print ( part . executable_code . code ) if part . code_execution_result is not None : print ( part . code_execution_result . output ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const tools = [{ googleSearch : {}}] const config = { responseModalities : [ Modality . TEXT ], tools : tools } async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } else if ( message . toolCall ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'When did the last Brazil vs. Argentina soccer match happen?' ; session . sendClientContent ({ turns : inputTurns }); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . modelTurn && turn . serverContent . modelTurn . parts ) { for ( const part of turn . serverContent . modelTurn . parts ) { if ( part . text ) { console . debug ( 'Received text: %s\n' , part . text ); } else if ( part . executableCode ) { console . debug ( 'executableCode: %s\n' , part . executableCode . code ); } else if ( part . codeExecutionResult ) { console . debug ( 'codeExecutionResult: %s\n' , part . codeExecutionResult . output ); } } } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); Combining multiple tools You can combine multiple tools within the Live API, increasing your application's capabilities even more: Python prompt = """ Hey, I need you to do three things for me. 1. Compute the largest prime palindrome under 100000. 2. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024? 3. Turn on the lights Thanks! """ tools = [ { "google_search" : {}}, { "code_execution" : {}}, { "function_declarations" : [ turn_on_the_lights , turn_off_the_lights ]}, ] config = { "response_modalities" : [ "TEXT" ], "tools" : tools } # ... remaining model call JavaScript const prompt = `Hey, I need you to do three things for me. 1. Compute the largest prime palindrome under 100000. 2. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024? 3. Turn on the lights Thanks! ` const tools = [ { googleSearch : {} }, { codeExecution : {} }, { functionDeclarations : [ turn_on_the_lights , turn_off_the_lights ] } ] const config = { responseModalities : [ Modality . TEXT ], tools : tools } // ... remaining model call What's next Check out more examples of using tools with the Live API in the Tool use cookbook . Get the full story on features and configurations from the Live API Capabilities guide . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_live-tools_e7e6ec29.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/live-tools#main-content
2
  Title: Tool use with Live API  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Tool use with Live API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Tool use with Live API Tool use allows Live API to go beyond just conversation by enabling it to perform actions in the real-world and pull in external context while maintaining a real time connection. You can define tools such as Function calling , Code execution , and Google Search with the Live API. Overview of supported tools Here's a brief overview of the available tools for each model: Tool Cascaded models gemini-live-2.5-flash-preview gemini-2.0-flash-live-001 gemini-2.5-flash-preview-native-audio-dialog gemini-2.5-flash-exp-native-audio-thinking-dialog Search Yes Yes Yes Function calling Yes Yes No Code execution Yes No No Url context Yes No No Function calling Live API supports function calling, just like regular content generation requests. Function calling lets the Live API interact with external data and programs, greatly increasing what your applications can accomplish. You can define function declarations as part of the session configuration. After receiving tool calls, the client should respond with a list of FunctionResponse objects using the session.send_tool_response method. See the Function calling tutorial to learn more. Note: Unlike the generateContent API, the Live API doesn't support automatic tool response handling. You must handle tool responses manually in your client code. Python import asyncio from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" # Simple function definitions turn_on_the_lights = { "name" : "turn_on_the_lights" } turn_off_the_lights = { "name" : "turn_off_the_lights" } tools = [{ "function_declarations" : [ turn_on_the_lights , turn_off_the_lights ]}] config = { "response_modalities" : [ "TEXT" ], "tools" : tools } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : prompt = "Turn on the lights please" await session . send_client_content ( turns = { "parts" : [{ "text" : prompt }]}) async for chunk in session . receive (): if chunk . server_content : if chunk . text is not None : print ( chunk . text ) elif chunk . tool_call : function_responses = [] for fc in chunk . tool_call . function_calls : function_response = types . FunctionResponse ( id = fc . id , name = fc . name , response = { "result" : "ok" } # simple, hard-coded function response ) function_responses . append ( function_response ) await session . send_tool_response ( function_responses = function_responses ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; // Simple function definitions const turn_on_the_lights = { name : "turn_on_the_lights" } // , description: '...', parameters: { ... } const turn_off_the_lights = { name : "turn_off_the_lights" } const tools = [{ functionDeclarations : [ turn_on_the_lights , turn_off_the_lights ] }] const config = { responseModalities : [ Modality . TEXT ], tools : tools } async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } else if ( message . toolCall ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'Turn on the lights please' ; session . sendClientContent ({ turns : inputTurns }); let turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . modelTurn && turn . serverContent . modelTurn . parts ) { for ( const part of turn . serverContent . modelTurn . parts ) { if ( part . text ) { console . debug ( 'Received text: %s\n' , part . text ); } } } else if ( turn . toolCall ) { const functionResponses = []; for ( const fc of turn . toolCall . functionCalls ) { functionResponses . push ({ id : fc . id , name : fc . name , response : { result : "ok" } // simple, hard-coded function response }); } console . debug ( 'Sending tool response...\n' ); session . sendToolResponse ({ functionResponses : functionResponses }); } } // Check again for new messages turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . modelTurn && turn . serverContent . modelTurn . parts ) { for ( const part of turn . serverContent . modelTurn . parts ) { if ( part . text ) { console . debug ( 'Received text: %s\n' , part . text ); } } } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); From a single prompt, the model can generate multiple function calls and the code necessary to chain their outputs. This code executes in a sandbox environment, generating subsequent BidiGenerateContentToolCall messages. Asynchronous function calling Note: Asynchronous function calling is only supported in half-cascade audio generation. Function calling executes sequentially by default, meaning execution pauses until the results of each function call are available. This ensures sequential processing, which means you won't be able to continue interacting with the model while the functions are being run. If you don't want to block the conversation, you can tell the model to run the functions asynchronously. To do so, you first need to add a behavior to the function definitions: Python # Non-blocking function definitions turn_on_the_lights = { "name" : "turn_on_the_lights" , "behavior" : "NON_BLOCKING" } # turn_on_the_lights will run asynchronously turn_off_the_lights = { "name" : "turn_off_the_lights" } # turn_off_the_lights will still pause all interactions with the model JavaScript import { GoogleGenAI , Modality , Behavior } from '@google/genai' ; // Non-blocking function definitions const turn_on_the_lights = { name : "turn_on_the_lights" , behavior : Behavior . NON_BLOCKING } // Blocking function definitions const turn_off_the_lights = { name : "turn_off_the_lights" } const tools = [{ functionDeclarations : [ turn_on_the_lights , turn_off_the_lights ] }] NON-BLOCKING ensures the function runs asynchronously while you can continue interacting with the model. Then you need to tell the model how to behave when it receives the FunctionResponse using the scheduling parameter. It can either: Interrupt what it's doing and tell you about the response it got right away ( scheduling="INTERRUPT" ), Wait until it's finished with what it's currently doing ( scheduling="WHEN_IDLE" ), Or do nothing and use that knowledge later on in the discussion ( scheduling="SILENT" ) Python # for a non-blocking function definition, apply scheduling in the function response: function_response = types . FunctionResponse ( id = fc . id , name = fc . name , response = { "result" : "ok" , "scheduling" : "INTERRUPT" # Can also be WHEN_IDLE or SILENT } ) JavaScript import { GoogleGenAI , Modality , Behavior , FunctionResponseScheduling } from '@google/genai' ; // for a non-blocking function definition, apply scheduling in the function response: const functionResponse = { id : fc . id , name : fc . name , response : { result : "ok" , scheduling : FunctionResponseScheduling . INTERRUPT // Can also be WHEN_IDLE or SILENT } } Code execution You can define code execution as part of the session configuration. This lets the Live API generate and execute Python code and dynamically perform computations to benefit your results. See the Code execution tutorial to learn more. Python import asyncio from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" tools = [{ 'code_execution' : {}}] config = { "response_modalities" : [ "TEXT" ], "tools" : tools } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : prompt = "Compute the largest prime palindrome under 100000." await session . send_client_content ( turns = { "parts" : [{ "text" : prompt }]}) async for chunk in session . receive (): if chunk . server_content : if chunk . text is not None : print ( chunk . text ) model_turn = chunk . server_content . model_turn if model_turn : for part in model_turn . parts : if part . executable_code is not None : print ( part . executable_code . code ) if part . code_execution_result is not None : print ( part . code_execution_result . output ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const tools = [{ codeExecution : {}}] const config = { responseModalities : [ Modality . TEXT ], tools : tools } async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } else if ( message . toolCall ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'Compute the largest prime palindrome under 100000.' ; session . sendClientContent ({ turns : inputTurns }); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . modelTurn && turn . serverContent . modelTurn . parts ) { for ( const part of turn . serverContent . modelTurn . parts ) { if ( part . text ) { console . debug ( 'Received text: %s\n' , part . text ); } else if ( part . executableCode ) { console . debug ( 'executableCode: %s\n' , part . executableCode . code ); } else if ( part . codeExecutionResult ) { console . debug ( 'codeExecutionResult: %s\n' , part . codeExecutionResult . output ); } } } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); Grounding with Google Search You can enable Grounding with Google Search as part of the session configuration. This increases the Live API's accuracy and prevents hallucinations. See the Grounding tutorial to learn more. Python import asyncio from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" tools = [{ 'google_search' : {}}] config = { "response_modalities" : [ "TEXT" ], "tools" : tools } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : prompt = "When did the last Brazil vs. Argentina soccer match happen?" await session . send_client_content ( turns = { "parts" : [{ "text" : prompt }]}) async for chunk in session . receive (): if chunk . server_content : if chunk . text is not None : print ( chunk . text ) # The model might generate and execute Python code to use Search model_turn = chunk . server_content . model_turn if model_turn : for part in model_turn . parts : if part . executable_code is not None : print ( part . executable_code . code ) if part . code_execution_result is not None : print ( part . code_execution_result . output ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const tools = [{ googleSearch : {}}] const config = { responseModalities : [ Modality . TEXT ], tools : tools } async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } else if ( message . toolCall ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'When did the last Brazil vs. Argentina soccer match happen?' ; session . sendClientContent ({ turns : inputTurns }); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . modelTurn && turn . serverContent . modelTurn . parts ) { for ( const part of turn . serverContent . modelTurn . parts ) { if ( part . text ) { console . debug ( 'Received text: %s\n' , part . text ); } else if ( part . executableCode ) { console . debug ( 'executableCode: %s\n' , part . executableCode . code ); } else if ( part . codeExecutionResult ) { console . debug ( 'codeExecutionResult: %s\n' , part . codeExecutionResult . output ); } } } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); Combining multiple tools You can combine multiple tools within the Live API, increasing your application's capabilities even more: Python prompt = """ Hey, I need you to do three things for me. 1. Compute the largest prime palindrome under 100000. 2. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024? 3. Turn on the lights Thanks! """ tools = [ { "google_search" : {}}, { "code_execution" : {}}, { "function_declarations" : [ turn_on_the_lights , turn_off_the_lights ]}, ] config = { "response_modalities" : [ "TEXT" ], "tools" : tools } # ... remaining model call JavaScript const prompt = `Hey, I need you to do three things for me. 1. Compute the largest prime palindrome under 100000. 2. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024? 3. Turn on the lights Thanks! ` const tools = [ { googleSearch : {} }, { codeExecution : {} }, { functionDeclarations : [ turn_on_the_lights , turn_off_the_lights ] } ] const config = { responseModalities : [ Modality . TEXT ], tools : tools } // ... remaining model call What's next Check out more examples of using tools with the Live API in the Tool use cookbook . Get the full story on features and configurations from the Live API Capabilities guide . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-08 UTC.
 
2
  Title: Tool use with Live API  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Tool use with Live API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Tool use with Live API Tool use allows Live API to go beyond just conversation by enabling it to perform actions in the real-world and pull in external context while maintaining a real time connection. You can define tools such as Function calling , Code execution , and Google Search with the Live API. Overview of supported tools Here's a brief overview of the available tools for each model: Tool Cascaded models gemini-live-2.5-flash-preview gemini-2.0-flash-live-001 gemini-2.5-flash-preview-native-audio-dialog gemini-2.5-flash-exp-native-audio-thinking-dialog Search Yes Yes Yes Function calling Yes Yes No Code execution Yes No No URL context Yes No No Function calling Live API supports function calling, just like regular content generation requests. Function calling lets the Live API interact with external data and programs, greatly increasing what your applications can accomplish. You can define function declarations as part of the session configuration. After receiving tool calls, the client should respond with a list of FunctionResponse objects using the session.send_tool_response method. See the Function calling tutorial to learn more. Note: Unlike the generateContent API, the Live API doesn't support automatic tool response handling. You must handle tool responses manually in your client code. Python import asyncio from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" # Simple function definitions turn_on_the_lights = { "name" : "turn_on_the_lights" } turn_off_the_lights = { "name" : "turn_off_the_lights" } tools = [{ "function_declarations" : [ turn_on_the_lights , turn_off_the_lights ]}] config = { "response_modalities" : [ "TEXT" ], "tools" : tools } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : prompt = "Turn on the lights please" await session . send_client_content ( turns = { "parts" : [{ "text" : prompt }]}) async for chunk in session . receive (): if chunk . server_content : if chunk . text is not None : print ( chunk . text ) elif chunk . tool_call : function_responses = [] for fc in chunk . tool_call . function_calls : function_response = types . FunctionResponse ( id = fc . id , name = fc . name , response = { "result" : "ok" } # simple, hard-coded function response ) function_responses . append ( function_response ) await session . send_tool_response ( function_responses = function_responses ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; // Simple function definitions const turn_on_the_lights = { name : "turn_on_the_lights" } // , description: '...', parameters: { ... } const turn_off_the_lights = { name : "turn_off_the_lights" } const tools = [{ functionDeclarations : [ turn_on_the_lights , turn_off_the_lights ] }] const config = { responseModalities : [ Modality . TEXT ], tools : tools } async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } else if ( message . toolCall ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'Turn on the lights please' ; session . sendClientContent ({ turns : inputTurns }); let turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . modelTurn && turn . serverContent . modelTurn . parts ) { for ( const part of turn . serverContent . modelTurn . parts ) { if ( part . text ) { console . debug ( 'Received text: %s\n' , part . text ); } } } else if ( turn . toolCall ) { const functionResponses = []; for ( const fc of turn . toolCall . functionCalls ) { functionResponses . push ({ id : fc . id , name : fc . name , response : { result : "ok" } // simple, hard-coded function response }); } console . debug ( 'Sending tool response...\n' ); session . sendToolResponse ({ functionResponses : functionResponses }); } } // Check again for new messages turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . modelTurn && turn . serverContent . modelTurn . parts ) { for ( const part of turn . serverContent . modelTurn . parts ) { if ( part . text ) { console . debug ( 'Received text: %s\n' , part . text ); } } } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); From a single prompt, the model can generate multiple function calls and the code necessary to chain their outputs. This code executes in a sandbox environment, generating subsequent BidiGenerateContentToolCall messages. Asynchronous function calling Note: Asynchronous function calling is only supported in half-cascade audio generation. Function calling executes sequentially by default, meaning execution pauses until the results of each function call are available. This ensures sequential processing, which means you won't be able to continue interacting with the model while the functions are being run. If you don't want to block the conversation, you can tell the model to run the functions asynchronously. To do so, you first need to add a behavior to the function definitions: Python # Non-blocking function definitions turn_on_the_lights = { "name" : "turn_on_the_lights" , "behavior" : "NON_BLOCKING" } # turn_on_the_lights will run asynchronously turn_off_the_lights = { "name" : "turn_off_the_lights" } # turn_off_the_lights will still pause all interactions with the model JavaScript import { GoogleGenAI , Modality , Behavior } from '@google/genai' ; // Non-blocking function definitions const turn_on_the_lights = { name : "turn_on_the_lights" , behavior : Behavior . NON_BLOCKING } // Blocking function definitions const turn_off_the_lights = { name : "turn_off_the_lights" } const tools = [{ functionDeclarations : [ turn_on_the_lights , turn_off_the_lights ] }] NON-BLOCKING ensures the function runs asynchronously while you can continue interacting with the model. Then you need to tell the model how to behave when it receives the FunctionResponse using the scheduling parameter. It can either: Interrupt what it's doing and tell you about the response it got right away ( scheduling="INTERRUPT" ), Wait until it's finished with what it's currently doing ( scheduling="WHEN_IDLE" ), Or do nothing and use that knowledge later on in the discussion ( scheduling="SILENT" ) Python # for a non-blocking function definition, apply scheduling in the function response: function_response = types . FunctionResponse ( id = fc . id , name = fc . name , response = { "result" : "ok" , "scheduling" : "INTERRUPT" # Can also be WHEN_IDLE or SILENT } ) JavaScript import { GoogleGenAI , Modality , Behavior , FunctionResponseScheduling } from '@google/genai' ; // for a non-blocking function definition, apply scheduling in the function response: const functionResponse = { id : fc . id , name : fc . name , response : { result : "ok" , scheduling : FunctionResponseScheduling . INTERRUPT // Can also be WHEN_IDLE or SILENT } } Code execution You can define code execution as part of the session configuration. This lets the Live API generate and execute Python code and dynamically perform computations to benefit your results. See the Code execution tutorial to learn more. Python import asyncio from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" tools = [{ 'code_execution' : {}}] config = { "response_modalities" : [ "TEXT" ], "tools" : tools } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : prompt = "Compute the largest prime palindrome under 100000." await session . send_client_content ( turns = { "parts" : [{ "text" : prompt }]}) async for chunk in session . receive (): if chunk . server_content : if chunk . text is not None : print ( chunk . text ) model_turn = chunk . server_content . model_turn if model_turn : for part in model_turn . parts : if part . executable_code is not None : print ( part . executable_code . code ) if part . code_execution_result is not None : print ( part . code_execution_result . output ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const tools = [{ codeExecution : {}}] const config = { responseModalities : [ Modality . TEXT ], tools : tools } async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } else if ( message . toolCall ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'Compute the largest prime palindrome under 100000.' ; session . sendClientContent ({ turns : inputTurns }); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . modelTurn && turn . serverContent . modelTurn . parts ) { for ( const part of turn . serverContent . modelTurn . parts ) { if ( part . text ) { console . debug ( 'Received text: %s\n' , part . text ); } else if ( part . executableCode ) { console . debug ( 'executableCode: %s\n' , part . executableCode . code ); } else if ( part . codeExecutionResult ) { console . debug ( 'codeExecutionResult: %s\n' , part . codeExecutionResult . output ); } } } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); Grounding with Google Search You can enable Grounding with Google Search as part of the session configuration. This increases the Live API's accuracy and prevents hallucinations. See the Grounding tutorial to learn more. Python import asyncio from google import genai from google.genai import types client = genai . Client () model = "gemini-live-2.5-flash-preview" tools = [{ 'google_search' : {}}] config = { "response_modalities" : [ "TEXT" ], "tools" : tools } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : prompt = "When did the last Brazil vs. Argentina soccer match happen?" await session . send_client_content ( turns = { "parts" : [{ "text" : prompt }]}) async for chunk in session . receive (): if chunk . server_content : if chunk . text is not None : print ( chunk . text ) # The model might generate and execute Python code to use Search model_turn = chunk . server_content . model_turn if model_turn : for part in model_turn . parts : if part . executable_code is not None : print ( part . executable_code . code ) if part . code_execution_result is not None : print ( part . code_execution_result . output ) if __name__ == "__main__" : asyncio . run ( main ()) JavaScript import { GoogleGenAI , Modality } from '@google/genai' ; const ai = new GoogleGenAI ({}); const model = 'gemini-live-2.5-flash-preview' ; const tools = [{ googleSearch : {}}] const config = { responseModalities : [ Modality . TEXT ], tools : tools } async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } else if ( message . toolCall ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); const inputTurns = 'When did the last Brazil vs. Argentina soccer match happen?' ; session . sendClientContent ({ turns : inputTurns }); const turns = await handleTurn (); for ( const turn of turns ) { if ( turn . serverContent && turn . serverContent . modelTurn && turn . serverContent . modelTurn . parts ) { for ( const part of turn . serverContent . modelTurn . parts ) { if ( part . text ) { console . debug ( 'Received text: %s\n' , part . text ); } else if ( part . executableCode ) { console . debug ( 'executableCode: %s\n' , part . executableCode . code ); } else if ( part . codeExecutionResult ) { console . debug ( 'codeExecutionResult: %s\n' , part . codeExecutionResult . output ); } } } } session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); Combining multiple tools You can combine multiple tools within the Live API, increasing your application's capabilities even more: Python prompt = """ Hey, I need you to do three things for me. 1. Compute the largest prime palindrome under 100000. 2. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024? 3. Turn on the lights Thanks! """ tools = [ { "google_search" : {}}, { "code_execution" : {}}, { "function_declarations" : [ turn_on_the_lights , turn_off_the_lights ]}, ] config = { "response_modalities" : [ "TEXT" ], "tools" : tools } # ... remaining model call JavaScript const prompt = `Hey, I need you to do three things for me. 1. Compute the largest prime palindrome under 100000. 2. Then use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024? 3. Turn on the lights Thanks! ` const tools = [ { googleSearch : {} }, { codeExecution : {} }, { functionDeclarations : [ turn_on_the_lights , turn_off_the_lights ] } ] const config = { responseModalities : [ Modality . TEXT ], tools : tools } // ... remaining model call What's next Check out more examples of using tools with the Live API in the Tool use cookbook . Get the full story on features and configurations from the Live API Capabilities guide . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_live_1b8c48ea.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/live#audio-to-audio
2
  Title: Get started with Live API  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Get started with Live API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Get started with Live API Preview: The Live API is in preview. The Live API enables low-latency, real-time voice and video interactions with Gemini. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses, creating a natural conversational experience for your users. Live API offers a comprehensive set of features such as Voice Activity Detection , tool use and function calling , session management (for managing long running conversations) and ephemeral tokens (for secure client-sided authentication). This page gets you up and running with examples and basic code samples. Example applications Check out the following example applications that illustrate how to use Live API for end-to-end use cases: Live audio starter app on AI Studio, using JavaScript libraries to connect to Live API and stream bidirectional audio through your microphone and speakers. Live API Python cookbook using Pyaudio that connects to Live API. Partner integrations If you prefer a simpler development process, you can use Daily or LiveKit . These are third-party partner platforms that have already integrated the Gemini Live API over the WebRTC protocol to streamline the development of real-time audio and video applications. Before you begin building There are two important decisions to make before you begin building with the Live API: choosing a model and choosing an implementation approach. Choose an audio generation architecture If you're building an audio-based use case, your choice of model determines the audio generation architecture used to create the audio response: Native audio : This option provides the most natural and realistic-sounding speech and better multilingual performance. It also enables advanced features like affective (emotion-aware) dialogue , proactive audio (where the model can decide to ignore or respond to certain inputs), and "thinking" . Native audio is supported by the following native audio models : gemini-2.5-flash-preview-native-audio-dialog gemini-2.5-flash-exp-native-audio-thinking-dialog Half-cascade audio : This option uses a cascaded model architecture (native audio input and text-to-speech output). It offers better performance and reliability in production environments, especially with tool use . Half-cascaded audio is supported by the following models: gemini-live-2.5-flash-preview gemini-2.0-flash-live-001 Choose an implementation approach When integrating with Live API, you'll need to choose one of the following implementation approaches: Server-to-server : Your backend connects to the Live API using WebSockets . Typically, your client sends stream data (audio, video, text) to your server, which then forwards it to the Live API. Client-to-server : Your frontend code connects directly to the Live API using WebSockets to stream data, bypassing your backend. Note: Client-to-server generally offers better performance for streaming audio and video, since it bypasses the need to send the stream to your backend first. It's also easier to set up since you don't need to implement a proxy that sends data from your client to your server and then your server to the API. However, for production environments, in order to mitigate security risks, we recommend using ephemeral tokens instead of standard API keys. Get started This example reads a WAV file , sends it in the correct format, and saves the received data as WAV file. You can send audio by converting it to 16-bit PCM, 16kHz, mono format, and you can receive audio by setting AUDIO as response modality. The output uses a sample rate of 24kHz. Python # Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav # Install helpers for converting files: pip install librosa soundfile import asyncio import io from pathlib import Path import wave from google import genai from google.genai import types import soundfile as sf import librosa client = genai . Client () # Half cascade model: # model = "gemini-live-2.5-flash-preview" # Native audio output model: model = "gemini-2.5-flash-preview-native-audio-dialog" config = { "response_modalities" : [ "AUDIO" ], "system_instruction" : "You are a helpful assistant and answer in a friendly tone." , } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : buffer = io . BytesIO () y , sr = librosa . load ( "sample.wav" , sr = 16000 ) sf . write ( buffer , y , sr , format = 'RAW' , subtype = 'PCM_16' ) buffer . seek ( 0 ) audio_bytes = buffer . read () # If already in correct format, you can use this: # audio_bytes = Path("sample.pcm").read_bytes() await session . send_realtime_input ( audio = types . Blob ( data = audio_bytes , mime_type = "audio/pcm;rate=16000" ) ) wf = wave . open ( "audio.wav" , "wb" ) wf . setnchannels ( 1 ) wf . setsampwidth ( 2 ) wf . setframerate ( 24000 ) # Output is 24kHz async for response in session . receive (): if response . data is not None : wf . writeframes ( response . data ) # Un-comment this code to print audio data info # if response.server_content.model_turn is not None: # print(response.server_content.model_turn.parts[0].inline_data.mime_type) wf . close () if __name__ == "__main__" : asyncio . run ( main ()) JavaScript // Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav import { GoogleGenAI , Modality } from '@google/genai' ; import * as fs from "node:fs" ; import pkg from 'wavefile' ; // npm install wavefile const { WaveFile } = pkg ; const ai = new GoogleGenAI ({}); // WARNING: Do not use API keys in client-side (browser based) applications // Consider using Ephemeral Tokens instead // More information at: https://ai.google.dev/gemini-api/docs/ephemeral-tokens // Half cascade model: // const model = "gemini-live-2.5-flash-preview" // Native audio output model: const model = "gemini-2.5-flash-preview-native-audio-dialog" const config = { responseModalities : [ Modality . AUDIO ], systemInstruction : "You are a helpful assistant and answer in a friendly tone." }; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); // Send Audio Chunk const fileBuffer = fs . readFileSync ( "sample.wav" ); // Ensure audio conforms to API requirements (16-bit PCM, 16kHz, mono) const wav = new WaveFile (); wav . fromBuffer ( fileBuffer ); wav . toSampleRate ( 16000 ); wav . toBitDepth ( "16" ); const base64Audio = wav . toBase64 (); // If already in correct format, you can use this: // const fileBuffer = fs.readFileSync("sample.pcm"); // const base64Audio = Buffer.from(fileBuffer).toString('base64'); session . sendRealtimeInput ( { audio : { data : base64Audio , mimeType : "audio/pcm;rate=16000" } } ); const turns = await handleTurn (); // Combine audio data strings and save as wave file const combinedAudio = turns . reduce (( acc , turn ) = > { if ( turn . data ) { const buffer = Buffer . from ( turn . data , 'base64' ); const intArray = new Int16Array ( buffer . buffer , buffer . byteOffset , buffer . byteLength / Int16Array . BYTES_PER_ELEMENT ); return acc . concat ( Array . from ( intArray )); } return acc ; }, []); const audioBuffer = new Int16Array ( combinedAudio ); const wf = new WaveFile (); wf . fromScratch ( 1 , 24000 , '16' , audioBuffer ); // output is 24kHz fs . writeFileSync ( 'audio.wav' , wf . toBuffer ()); session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); What's next Read the full Live API Capabilities guide for key capabilities and configurations; including Voice Activity Detection and native audio features. Read the Tool use guide to learn how to integrate Live API with tools and function calling. Read the Session management guide for managing long running conversations. Read the Ephemeral tokens guide for secure authentication in client-to-server applications. For more information about the underlying WebSockets API, see the WebSockets API reference . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-08 UTC.
 
2
  Title: Get started with Live API  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Get started with Live API | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Get started with Live API Preview: The Live API is in preview. The Live API enables low-latency, real-time voice and video interactions with Gemini. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses, creating a natural conversational experience for your users. Live API offers a comprehensive set of features such as Voice Activity Detection , tool use and function calling , session management (for managing long running conversations) and ephemeral tokens (for secure client-sided authentication). This page gets you up and running with examples and basic code samples. Example applications Check out the following example applications that illustrate how to use Live API for end-to-end use cases: Live audio starter app on AI Studio, using JavaScript libraries to connect to Live API and stream bidirectional audio through your microphone and speakers. Live API Python cookbook using Pyaudio that connects to Live API. Partner integrations If you prefer a simpler development process, you can use Daily , LiveKit or Voximplant . These are third-party partner platforms that have already integrated the Gemini Live API over the WebRTC protocol to streamline the development of real-time audio and video applications. Before you begin building There are two important decisions to make before you begin building with the Live API: choosing a model and choosing an implementation approach. Choose an audio generation architecture If you're building an audio-based use case, your choice of model determines the audio generation architecture used to create the audio response: Native audio : This option provides the most natural and realistic-sounding speech and better multilingual performance. It also enables advanced features like affective (emotion-aware) dialogue , proactive audio (where the model can decide to ignore or respond to certain inputs), and "thinking" . Native audio is supported by the following native audio models : gemini-2.5-flash-preview-native-audio-dialog gemini-2.5-flash-exp-native-audio-thinking-dialog Half-cascade audio : This option uses a cascaded model architecture (native audio input and text-to-speech output). It offers better performance and reliability in production environments, especially with tool use . Half-cascaded audio is supported by the following models: gemini-live-2.5-flash-preview gemini-2.0-flash-live-001 Choose an implementation approach When integrating with Live API, you'll need to choose one of the following implementation approaches: Server-to-server : Your backend connects to the Live API using WebSockets . Typically, your client sends stream data (audio, video, text) to your server, which then forwards it to the Live API. Client-to-server : Your frontend code connects directly to the Live API using WebSockets to stream data, bypassing your backend. Note: Client-to-server generally offers better performance for streaming audio and video, since it bypasses the need to send the stream to your backend first. It's also easier to set up since you don't need to implement a proxy that sends data from your client to your server and then your server to the API. However, for production environments, in order to mitigate security risks, we recommend using ephemeral tokens instead of standard API keys. Get started This example reads a WAV file , sends it in the correct format, and saves the received data as WAV file. You can send audio by converting it to 16-bit PCM, 16kHz, mono format, and you can receive audio by setting AUDIO as response modality. The output uses a sample rate of 24kHz. Python # Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav # Install helpers for converting files: pip install librosa soundfile import asyncio import io from pathlib import Path import wave from google import genai from google.genai import types import soundfile as sf import librosa client = genai . Client () # Half cascade model: # model = "gemini-live-2.5-flash-preview" # Native audio output model: model = "gemini-2.5-flash-preview-native-audio-dialog" config = { "response_modalities" : [ "AUDIO" ], "system_instruction" : "You are a helpful assistant and answer in a friendly tone." , } async def main (): async with client . aio . live . connect ( model = model , config = config ) as session : buffer = io . BytesIO () y , sr = librosa . load ( "sample.wav" , sr = 16000 ) sf . write ( buffer , y , sr , format = 'RAW' , subtype = 'PCM_16' ) buffer . seek ( 0 ) audio_bytes = buffer . read () # If already in correct format, you can use this: # audio_bytes = Path("sample.pcm").read_bytes() await session . send_realtime_input ( audio = types . Blob ( data = audio_bytes , mime_type = "audio/pcm;rate=16000" ) ) wf = wave . open ( "audio.wav" , "wb" ) wf . setnchannels ( 1 ) wf . setsampwidth ( 2 ) wf . setframerate ( 24000 ) # Output is 24kHz async for response in session . receive (): if response . data is not None : wf . writeframes ( response . data ) # Un-comment this code to print audio data info # if response.server_content.model_turn is not None: # print(response.server_content.model_turn.parts[0].inline_data.mime_type) wf . close () if __name__ == "__main__" : asyncio . run ( main ()) JavaScript // Test file: https://storage.googleapis.com/generativeai-downloads/data/16000.wav import { GoogleGenAI , Modality } from '@google/genai' ; import * as fs from "node:fs" ; import pkg from 'wavefile' ; // npm install wavefile const { WaveFile } = pkg ; const ai = new GoogleGenAI ({}); // WARNING: Do not use API keys in client-side (browser based) applications // Consider using Ephemeral Tokens instead // More information at: https://ai.google.dev/gemini-api/docs/ephemeral-tokens // Half cascade model: // const model = "gemini-live-2.5-flash-preview" // Native audio output model: const model = "gemini-2.5-flash-preview-native-audio-dialog" const config = { responseModalities : [ Modality . AUDIO ], systemInstruction : "You are a helpful assistant and answer in a friendly tone." }; async function live () { const responseQueue = []; async function waitMessage () { let done = false ; let message = undefined ; while ( ! done ) { message = responseQueue . shift (); if ( message ) { done = true ; } else { await new Promise (( resolve ) = > setTimeout ( resolve , 100 )); } } return message ; } async function handleTurn () { const turns = []; let done = false ; while ( ! done ) { const message = await waitMessage (); turns . push ( message ); if ( message . serverContent && message . serverContent . turnComplete ) { done = true ; } } return turns ; } const session = await ai . live . connect ({ model : model , callbacks : { onopen : function () { console . debug ( 'Opened' ); }, onmessage : function ( message ) { responseQueue . push ( message ); }, onerror : function ( e ) { console . debug ( 'Error:' , e . message ); }, onclose : function ( e ) { console . debug ( 'Close:' , e . reason ); }, }, config : config , }); // Send Audio Chunk const fileBuffer = fs . readFileSync ( "sample.wav" ); // Ensure audio conforms to API requirements (16-bit PCM, 16kHz, mono) const wav = new WaveFile (); wav . fromBuffer ( fileBuffer ); wav . toSampleRate ( 16000 ); wav . toBitDepth ( "16" ); const base64Audio = wav . toBase64 (); // If already in correct format, you can use this: // const fileBuffer = fs.readFileSync("sample.pcm"); // const base64Audio = Buffer.from(fileBuffer).toString('base64'); session . sendRealtimeInput ( { audio : { data : base64Audio , mimeType : "audio/pcm;rate=16000" } } ); const turns = await handleTurn (); // Combine audio data strings and save as wave file const combinedAudio = turns . reduce (( acc , turn ) = > { if ( turn . data ) { const buffer = Buffer . from ( turn . data , 'base64' ); const intArray = new Int16Array ( buffer . buffer , buffer . byteOffset , buffer . byteLength / Int16Array . BYTES_PER_ELEMENT ); return acc . concat ( Array . from ( intArray )); } return acc ; }, []); const audioBuffer = new Int16Array ( combinedAudio ); const wf = new WaveFile (); wf . fromScratch ( 1 , 24000 , '16' , audioBuffer ); // output is 24kHz fs . writeFileSync ( 'audio.wav' , wf . toBuffer ()); session . close (); } async function main () { await live (). catch (( e ) = > console . error ( 'got error' , e )); } main (); What's next Read the full Live API Capabilities guide for key capabilities and configurations; including Voice Activity Detection and native audio features. Read the Tool use guide to learn how to integrate Live API with tools and function calling. Read the Session management guide for managing long running conversations. Read the Ephemeral tokens guide for secure authentication in client-to-server applications. For more information about the underlying WebSockets API, see the WebSockets API reference . Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_migrate_8812abe1.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/migrate#main-content
2
  Title: Migrate to the Google GenAI SDK  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Migrate to the Google GenAI SDK | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Migrate to the Google GenAI SDK Starting with the Gemini 2.0 release in late 2024, we introduced a new set of libraries called the Google GenAI SDK . It offers an improved developer experience through an updated client architecture , and simplifies the transition between developer and enterprise workflows. The Google GenAI SDK is now in General Availability (GA) across all supported platforms. If you're using one of our legacy libraries , we strongly recommend you to migrate. This guide provides before-and-after examples of migrated code to help you get started. Note: The Go examples omit imports and other boilerplate code to improve readability. Installation Before Python pip install -U -q "google-generativeai" JavaScript npm install @google/generative-ai Go go get github.com/google/generative-ai-go After Python pip install -U -q "google-genai" JavaScript npm install @google/genai Go go get google.golang.org/genai API access The old SDK implicitly handled the API client behind the scenes using a variety of ad hoc methods. This made it hard to manage the client and credentials. Now, you interact through a central Client object. This Client object acts as a single entry point for various API services (e.g., models , chats , files , tunings ), promoting consistency and simplifying credential and configuration management across different API calls. Before (Less Centralized API Access) Python The old SDK didn't explicitly use a top-level client object for most API calls. You would directly instantiate and interact with GenerativeModel objects. import google.generativeai as genai # Directly create and use model objects model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . generate_content ( ... ) chat = model . start_chat ( ... ) JavaScript While GoogleGenerativeAI was a central point for models and chat, other functionalities like file and cache management often required importing and instantiating entirely separate client classes. import { GoogleGenerativeAI } from "@google/generative-ai" ; import { GoogleAIFileManager , GoogleAICacheManager } from "@google/generative-ai/server" ; // For files/caching const genAI = new GoogleGenerativeAI ( "YOUR_API_KEY" ); const fileManager = new GoogleAIFileManager ( "YOUR_API_KEY" ); const cacheManager = new GoogleAICacheManager ( "YOUR_API_KEY" ); // Get a model instance, then call methods on it const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" }); const result = await model . generateContent (...); const chat = model . startChat (...); // Call methods on separate client objects for other services const uploadedFile = await fileManager . uploadFile (...); const cache = await cacheManager . create (...); Go The genai.NewClient function created a client, but generative model operations were typically called on a separate GenerativeModel instance obtained from this client. Other services might have been accessed via distinct packages or patterns. import ( "github.com/google/generative-ai-go/genai" "github.com/google/generative-ai-go/genai/fileman" // For files "google.golang.org/api/option" ) client , err := genai . NewClient ( ctx , option . WithAPIKey ( "YOUR_API_KEY" )) fileClient , err := fileman . NewClient ( ctx , option . WithAPIKey ( "YOUR_API_KEY" )) // Get a model instance, then call methods on it model := client . GenerativeModel ( "gemini-1.5-flash" ) resp , err := model . GenerateContent ( ... ) cs := model . StartChat () // Call methods on separate client objects for other services uploadedFile , err := fileClient . UploadFile ( ... ) After (Centralized Client Object) Python from google import genai # Create a single client object client = genai . Client () # Access API methods through services on the client object response = client . models . generate_content ( ... ) chat = client . chats . create ( ... ) my_file = client . files . upload ( ... ) tuning_job = client . tunings . tune ( ... ) JavaScript import { GoogleGenAI } from "@google/genai" ; // Create a single client object const ai = new GoogleGenAI ({ apiKey : "YOUR_API_KEY" }); // Access API methods through services on the client object const response = await ai . models . generateContent (...); const chat = ai . chats . create (...); const uploadedFile = await ai . files . upload (...); const cache = await ai . caches . create (...); Go import "google.golang.org/genai" // Create a single client object client , err := genai . NewClient ( ctx , nil ) // Access API methods through services on the client object result , err := client . Models . GenerateContent ( ... ) chat , err := client . Chats . Create ( ... ) uploadedFile , err := client . Files . Upload ( ... ) tuningJob , err := client . Tunings . Tune ( ... ) Authentication Both legacy and new libraries authenticate using API keys. You can create your API key in Google AI Studio. Before Python The old SDK handled the API client object implicitly. import google.generativeai as genai genai . configure ( api_key =... ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); Go Import the Google libraries: import ( "github.com/google/generative-ai-go/genai" "google.golang.org/api/option" ) Create the client: client , err := genai . NewClient ( ctx , option . WithAPIKey ( "GOOGLE_API_KEY" )) After Python With Google GenAI SDK, you create an API client first, which is used to call the API. The new SDK will pick up your API key from either one of the GEMINI_API_KEY or GOOGLE_API_KEY environment variables, if you don't pass one to the client. export GEMINI_API_KEY = "YOUR_API_KEY" from google import genai client = genai . Client () # Set the API key using the GEMINI_API_KEY env var. # Alternatively, you could set the API key explicitly: # client = genai.Client(api_key="your_api_key") JavaScript import { GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({ apiKey : "GEMINI_API_KEY" }); Go Import the GenAI library: import "google.golang.org/genai" Create the client: client , err := genai . NewClient ( ctx , & genai . ClientConfig { Backend : genai . BackendGeminiAPI , }) Generate content Text Before Python Previously, there were no client objects, you accessed APIs directly through GenerativeModel objects. import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . generate_content ( 'Tell me a story in 300 words' ) print ( response . text ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( process . env . API_KEY ); const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" }); const prompt = "Tell me a story in 300 words" ; const result = await model . generateContent ( prompt ); console . log ( result . response . text ()); Go ctx := context . Background () client , err := genai . NewClient ( ctx , option . WithAPIKey ( "GOOGLE_API_KEY" )) if err != nil { log . Fatal ( err ) } defer client . Close () model := client . GenerativeModel ( "gemini-1.5-flash" ) resp , err := model . GenerateContent ( ctx , genai . Text ( "Tell me a story in 300 words." )) if err != nil { log . Fatal ( err ) } printResponse ( resp ) // utility for printing response parts After Python The new Google GenAI SDK provides access to all the API methods through the Client object. Except for a few stateful special cases ( chat and live-api session s), these are all stateless functions. For utility and uniformity, objects returned are pydantic classes. from google import genai client = genai . Client () response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = 'Tell me a story in 300 words.' ) print ( response . text ) print ( response . model_dump_json ( exclude_none = True , indent = 4 )) JavaScript import { GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const response = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : "Tell me a story in 300 words." , }); console . log ( response . text ); Go ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , err := client . Models . GenerateContent ( ctx , "gemini-2.0-flash" , genai . Text ( "Tell me a story in 300 words." ), nil ) if err != nil { log . Fatal ( err ) } debugPrint ( result ) // utility for printing result Image Before Python import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . generate_content ([ 'Tell me a story based on this image' , Image . open ( image_path ) ]) print ( response . text ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" }); function fileToGenerativePart ( path , mimeType ) { return { inlineData : { data : Buffer . from ( fs . readFileSync ( path )). toString ( "base64" ), mimeType , }, }; } const prompt = "Tell me a story based on this image" ; const imagePart = fileToGenerativePart ( `path/to/organ.jpg` , "image/jpeg" , ); const result = await model . generateContent ([ prompt , imagePart ]); console . log ( result . response . text ()); Go ctx := context . Background () client , err := genai . NewClient ( ctx , option . WithAPIKey ( "GOOGLE_API_KEY" )) if err != nil { log . Fatal ( err ) } defer client . Close () model := client . GenerativeModel ( "gemini-1.5-flash" ) imgData , err := os . ReadFile ( "path/to/organ.jpg" ) if err != nil { log . Fatal ( err ) } resp , err := model . GenerateContent ( ctx , genai . Text ( "Tell me about this instrument" ), genai . ImageData ( "jpeg" , imgData )) if err != nil { log . Fatal ( err ) } printResponse ( resp ) // utility for printing response After Python Many of the same convenience features exist in the new SDK. For example, PIL.Image objects are automatically converted. from google import genai from PIL import Image client = genai . Client () response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = [ 'Tell me a story based on this image' , Image . open ( image_path ) ] ) print ( response . text ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const organ = await ai . files . upload ({ file : "path/to/organ.jpg" , }); const response = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : [ createUserContent ([ "Tell me a story based on this image" , createPartFromUri ( organ . uri , organ . mimeType ) ]), ], }); console . log ( response . text ); Go ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imgData , err := os . ReadFile ( "path/to/organ.jpg" ) if err != nil { log . Fatal ( err ) } parts := [] * genai . Part { { Text : "Tell me a story based on this image" }, { InlineData : & genai . Blob { Data : imgData , MIMEType : "image/jpeg" }}, } contents := [] * genai . Content { { Parts : parts }, } result , err := client . Models . GenerateContent ( ctx , "gemini-2.0-flash" , contents , nil ) if err != nil { log . Fatal ( err ) } debugPrint ( result ) // utility for printing result Streaming Before Python import google.generativeai as genai response = model . generate_content ( "Write a cute story about cats." , stream = True ) for chunk in response : print ( chunk . text ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" }); const prompt = "Write a story about a magic backpack." ; const result = await model . generateContentStream ( prompt ); // Print text as it comes in. for await ( const chunk of result . stream ) { const chunkText = chunk . text (); process . stdout . write ( chunkText ); } Go ctx := context . Background () client , err := genai . NewClient ( ctx , option . WithAPIKey ( "GOOGLE_API_KEY" )) if err != nil { log . Fatal ( err ) } defer client . Close () model := client . GenerativeModel ( "gemini-1.5-flash" ) iter := model . GenerateContentStream ( ctx , genai . Text ( "Write a story about a magic backpack." )) for { resp , err := iter . Next () if err == iterator . Done { break } if err != nil { log . Fatal ( err ) } printResponse ( resp ) // utility for printing the response } After Python from google import genai client = genai . Client () for chunk in client . models . generate_content_stream ( model = 'gemini-2.0-flash' , contents = 'Tell me a story in 300 words.' ): print ( chunk . text ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const response = await ai . models . generateContentStream ({ model : "gemini-2.0-flash" , contents : "Write a story about a magic backpack." , }); let text = "" ; for await ( const chunk of response ) { console . log ( chunk . text ); text += chunk . text ; } Go ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } for result , err := range client . Models . GenerateContentStream ( ctx , "gemini-2.0-flash" , genai . Text ( "Write a story about a magic backpack." ), nil , ) { if err != nil { log . Fatal ( err ) } fmt . Print ( result . Candidates [ 0 ]. Content . Parts [ 0 ]. Text ) } Configuration Before Python import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' , system_instruction = 'you are a story teller for kids under 5 years old' , generation_config = genai . GenerationConfig ( max_output_tokens = 400 , top_k = 2 , top_p = 0.5 , temperature = 0.5 , response_mime_type = 'application/json' , stop_sequences = [ ' \n ' ], ) ) response = model . generate_content ( 'tell me a story in 100 words' ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" , generationConfig : { candidateCount : 1 , stopSequences : [ "x" ], maxOutputTokens : 20 , temperature : 1.0 , }, }); const result = await model . generateContent ( "Tell me a story about a magic backpack." , ); console . log ( result . response . text ()) Go ctx := context . Background () client , err := genai . NewClient ( ctx , option . WithAPIKey ( "GOOGLE_API_KEY" )) if err != nil { log . Fatal ( err ) } defer client . Close () model := client . GenerativeModel ( "gemini-1.5-flash" ) model . SetTemperature ( 0.5 ) model . SetTopP ( 0.5 ) model . SetTopK ( 2.0 ) model . SetMaxOutputTokens ( 100 ) model . ResponseMIMEType = "application/json" resp , err := model . GenerateContent ( ctx , genai . Text ( "Tell me about New York" )) if err != nil { log . Fatal ( err ) } printResponse ( resp ) // utility for printing response After Python For all methods in the new SDK, the required arguments are provided as keyword arguments. All optional inputs are provided in the config argument. Config arguments can be specified as either Python dictionaries or Config classes in the google.genai.types namespace. For utility and uniformity, all definitions within the types module are pydantic classes. from google import genai from google.genai import types client = genai . Client () response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = 'Tell me a story in 100 words.' , config = types . GenerateContentConfig ( system_instruction = 'you are a story teller for kids under 5 years old' , max_output_tokens = 400 , top_k = 2 , top_p = 0.5 , temperature = 0.5 , response_mime_type = 'application/json' , stop_sequences = [ ' \n ' ], seed = 42 , ), ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const response = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : "Tell me a story about a magic backpack." , config : { candidateCount : 1 , stopSequences : [ "x" ], maxOutputTokens : 20 , temperature : 1.0 , }, }); console . log ( response . text ); Go ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , err := client . Models . GenerateContent ( ctx , "gemini-2.0-flash" , genai . Text ( "Tell me about New York" ), & genai . GenerateContentConfig { Temperature : genai . Ptr [ float32 ]( 0.5 ), TopP : genai . Ptr [ float32 ]( 0.5 ), TopK : genai . Ptr [ float32 ]( 2.0 ), ResponseMIMEType : "application/json" , StopSequences : [] string { "Yankees" }, CandidateCount : 2 , Seed : genai . Ptr [ int32 ]( 42 ), MaxOutputTokens : 128 , PresencePenalty : genai . Ptr [ float32 ]( 0.5 ), FrequencyPenalty : genai . Ptr [ float32 ]( 0.5 ), }, ) if err != nil { log . Fatal ( err ) } debugPrint ( result ) // utility for printing response Safety settings Generate a response with safety settings: Before Python import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . generate_content ( 'say something bad' , safety_settings = { 'HATE' : 'BLOCK_ONLY_HIGH' , 'HARASSMENT' : 'BLOCK_ONLY_HIGH' , } ) JavaScript import { GoogleGenerativeAI , HarmCategory , HarmBlockThreshold } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" , safetySettings : [ { category : HarmCategory . HARM_CATEGORY_HARASSMENT , threshold : HarmBlockThreshold . BLOCK_LOW_AND_ABOVE , }, ], }); const unsafePrompt = "I support Martians Soccer Club and I think " + "Jupiterians Football Club sucks! Write an ironic phrase telling " + "them how I feel about them." ; const result = await model . generateContent ( unsafePrompt ); try { result . response . text (); } catch ( e ) { console . error ( e ); console . log ( result . response . candidates [ 0 ]. safetyRatings ); } After Python from google import genai from google.genai import types client = genai . Client () response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = 'say something bad' , config = types . GenerateContentConfig ( safety_settings = [ types . SafetySetting ( category = 'HARM_CATEGORY_HATE_SPEECH' , threshold = 'BLOCK_ONLY_HIGH' ), ] ), ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const unsafePrompt = "I support Martians Soccer Club and I think " + "Jupiterians Football Club sucks! Write an ironic phrase telling " + "them how I feel about them." ; const response = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : unsafePrompt , config : { safetySettings : [ { category : "HARM_CATEGORY_HARASSMENT" , threshold : "BLOCK_ONLY_HIGH" , }, ], }, }); console . log ( "Finish reason:" , response . candidates [ 0 ]. finishReason ); console . log ( "Safety ratings:" , response . candidates [ 0 ]. safetyRatings ); Async Before Python import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . generate_content_async ( 'tell me a story in 100 words' ) After Python To use the new SDK with asyncio , there is a separate async implementation of every method under client.aio . from google import genai client = genai . Client () response = await client . aio . models . generate_content ( model = 'gemini-2.0-flash' , contents = 'Tell me a story in 300 words.' ) Chat Start a chat and send a message to the model: Before Python import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' ) chat = model . start_chat () response = chat . send_message ( "Tell me a story in 100 words" ) response = chat . send_message ( "What happened after that?" ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" }); const chat = model . startChat ({ history : [ { role : "user" , parts : [{ text : "Hello" }], }, { role : "model" , parts : [{ text : "Great to meet you. What would you like to know?" }], }, ], }); let result = await chat . sendMessage ( "I have 2 dogs in my house." ); console . log ( result . response . text ()); result = await chat . sendMessage ( "How many paws are in my house?" ); console . log ( result . response . text ()); Go ctx := context . Background () client , err := genai . NewClient ( ctx , option . WithAPIKey ( "GOOGLE_API_KEY" )) if err != nil { log . Fatal ( err ) } defer client . Close () model := client . GenerativeModel ( "gemini-1.5-flash" ) cs := model . StartChat () cs . History = [] * genai . Content { { Parts : [] genai . Part { genai . Text ( "Hello, I have 2 dogs in my house." ), }, Role : "user" , }, { Parts : [] genai . Part { genai . Text ( "Great to meet you. What would you like to know?" ), }, Role : "model" , }, } res , err := cs . SendMessage ( ctx , genai . Text ( "How many paws are in my house?" )) if err != nil { log . Fatal ( err ) } printResponse ( res ) // utility for printing the response After Python from google import genai client = genai . Client () chat = client . chats . create ( model = 'gemini-2.0-flash' ) response = chat . send_message ( message = 'Tell me a story in 100 words' ) response = chat . send_message ( message = 'What happened after that?' ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const chat = ai . chats . create ({ model : "gemini-2.0-flash" , history : [ { role : "user" , parts : [{ text : "Hello" }], }, { role : "model" , parts : [{ text : "Great to meet you. What would you like to know?" }], }, ], }); const response1 = await chat . sendMessage ({ message : "I have 2 dogs in my house." , }); console . log ( "Chat response 1:" , response1 . text ); const response2 = await chat . sendMessage ({ message : "How many paws are in my house?" , }); console . log ( "Chat response 2:" , response2 . text ); Go ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } chat , err := client . Chats . Create ( ctx , "gemini-2.0-flash" , nil , nil ) if err != nil { log . Fatal ( err ) } result , err := chat . SendMessage ( ctx , genai . Part { Text : "Hello, I have 2 dogs in my house." }) if err != nil { log . Fatal ( err ) } debugPrint ( result ) // utility for printing result result , err = chat . SendMessage ( ctx , genai . Part { Text : "How many paws are in my house?" }) if err != nil { log . Fatal ( err ) } debugPrint ( result ) // utility for printing result Function calling Before Python import google.generativeai as genai from enum import Enum def get_current_weather ( location : str ) - > str : """Get the current whether in a given location. Args: location: required, The city and state, e.g. San Franciso, CA unit: celsius or fahrenheit """ print ( f 'Called with: { location =} ' ) return "23C" model = genai . GenerativeModel ( model_name = "gemini-1.5-flash" , tools = [ get_current_weather ] ) response = model . generate_content ( "What is the weather in San Francisco?" ) function_call = response . candidates [ 0 ] . parts [ 0 ] . function_call After Python In the new SDK, automatic function calling is the default. Here, you disable it. from google import genai from google.genai import types client = genai . Client () def get_current_weather ( location : str ) - > str : """Get the current whether in a given location. Args: location: required, The city and state, e.g. San Franciso, CA unit: celsius or fahrenheit """ print ( f 'Called with: { location =} ' ) return "23C" response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = "What is the weather like in Boston?" , config = types . GenerateContentConfig ( tools = [ get_current_weather ], automatic_function_calling = { 'disable' : True }, ), ) function_call = response . candidates [ 0 ] . content . parts [ 0 ] . function_call Automatic function calling Before Python The old SDK only supports automatic function calling in chat. In the new SDK this is the default behavior in generate_content . import google.generativeai as genai def get_current_weather ( city : str ) - > str : return "23C" model = genai . GenerativeModel ( model_name = "gemini-1.5-flash" , tools = [ get_current_weather ] ) chat = model . start_chat ( enable_automatic_function_calling = True ) result = chat . send_message ( "What is the weather in San Francisco?" ) After Python from google import genai from google.genai import types client = genai . Client () def get_current_weather ( city : str ) - > str : return "23C" response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = "What is the weather like in Boston?" , config = types . GenerateContentConfig ( tools = [ get_current_weather ] ), ) Code execution Code execution is a tool that allows the model to generate Python code, run it, and return the result. Before Python import google.generativeai as genai model = genai . GenerativeModel ( model_name = "gemini-1.5-flash" , tools = "code_execution" ) result = model . generate_content ( "What is the sum of the first 50 prime numbers? Generate and run code for " "the calculation, and make sure you get all 50." ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" , tools : [{ codeExecution : {} }], }); const result = await model . generateContent ( "What is the sum of the first 50 prime numbers? " + "Generate and run code for the calculation, and make sure you get " + "all 50." , ); console . log ( result . response . text ()); After Python from google import genai from google.genai import types client = genai . Client () response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = 'What is the sum of the first 50 prime numbers? Generate and run ' 'code for the calculation, and make sure you get all 50.' , config = types . GenerateContentConfig ( tools = [ types . Tool ( code_execution = types . ToolCodeExecution )], ), ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const response = await ai . models . generateContent ({ model : "gemini-2.0-pro-exp-02-05" , contents : `Write and execute code that calculates the sum of the first 50 prime numbers. Ensure that only the executable code and its resulting output are generated.` , }); // Each part may contain text, executable code, or an execution result. for ( const part of response . candidates [ 0 ]. content . parts ) { console . log ( part ); console . log ( "\n" ); } console . log ( "-" . repeat ( 80 )); // The `.text` accessor concatenates the parts into a markdown-formatted text. console . log ( "\n" , response . text ); Search grounding GoogleSearch (Gemini>=2.0) and GoogleSearchRetrieval (Gemini < 2.0) are tools that allow the model to retrieve public web data for grounding, powered by Google. Before Python import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . generate_content ( contents = "what is the Google stock price?" , tools = 'google_search_retrieval' ) After Python from google import genai from google.genai import types client = genai . Client () response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = 'What is the Google stock price?' , config = types . GenerateContentConfig ( tools = [ types . Tool ( google_search = types . GoogleSearch () ) ] ) ) JSON response Generate answers in JSON format. Before Python By specifying a response_schema and setting response_mime_type="application/json" users can constrain the model to produce a JSON response following a given structure. import google.generativeai as genai import typing_extensions as typing class CountryInfo ( typing . TypedDict ): name : str population : int capital : str continent : str major_cities : list [ str ] gdp : int official_language : str total_area_sq_mi : int model = genai . GenerativeModel ( model_name = "gemini-1.5-flash" ) result = model . generate_content ( "Give me information of the United States" , generation_config = genai . GenerationConfig ( response_mime_type = "application/json" , response_schema = CountryInfo ), ) JavaScript import { GoogleGenerativeAI , SchemaType } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const schema = { description : "List of recipes" , type : SchemaType . ARRAY , items : { type : SchemaType . OBJECT , properties : { recipeName : { type : SchemaType . STRING , description : "Name of the recipe" , nullable : false , }, }, required : [ "recipeName" ], }, }; const model = genAI . getGenerativeModel ({ model : "gemini-1.5-pro" , generationConfig : { responseMimeType : "application/json" , responseSchema : schema , }, }); const result = await model . generateContent ( "List a few popular cookie recipes." , ); console . log ( result . response . text ()); After Python The new SDK uses pydantic classes to provide the schema (although you can pass a genai.types.Schema , or equivalent dict ). When possible, the SDK will parse the returned JSON, and return the result in response.parsed . If you provided a pydantic class as the schema the SDK will convert that JSON to an instance of the class. from google import genai from pydantic import BaseModel client = genai . Client () class CountryInfo ( BaseModel ): name : str population : int capital : str continent : str major_cities : list [ str ] gdp : int official_language : str total_area_sq_mi : int response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = 'Give me information of the United States.' , config = { 'response_mime_type' : 'application/json' , 'response_schema' : CountryInfo , }, ) response . parsed JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const response = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : "List a few popular cookie recipes." , config : { responseMimeType : "application/json" , responseSchema : { type : "array" , items : { type : "object" , properties : { recipeName : { type : "string" }, ingredients : { type : "array" , items : { type : "string" } }, }, required : [ "recipeName" , "ingredients" ], }, }, }, }); console . log ( response . text ); Files Upload Upload a file: Before Python import requests import pathlib import google.generativeai as genai # Download file response = requests . get ( 'https://storage.googleapis.com/generativeai-downloads/data/a11.txt' ) pathlib . Path ( 'a11.txt' ) . write_text ( response . text ) file = genai . upload_file ( path = 'a11.txt' ) model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . generate_content ([ 'Can you summarize this file:' , my_file ]) print ( response . text ) After Python import requests import pathlib from google import genai client = genai . Client () # Download file response = requests . get ( 'https://storage.googleapis.com/generativeai-downloads/data/a11.txt' ) pathlib . Path ( 'a11.txt' ) . write_text ( response . text ) my_file = client . files . upload ( file = 'a11.txt' ) response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = [ 'Can you summarize this file:' , my_file ] ) print ( response . text ) List and get List uploaded files and get an uploaded file with a filename: Before Python import google.generativeai as genai for file in genai . list_files (): print ( file . name ) file = genai . get_file ( name = file . name ) After Python from google import genai client = genai . Client () for file in client . files . list (): print ( file . name ) file = client . files . get ( name = file . name ) Delete Delete a file: Before Python import pathlib import google.generativeai as genai pathlib . Path ( 'dummy.txt' ) . write_text ( dummy ) dummy_file = genai . upload_file ( path = 'dummy.txt' ) file = genai . delete_file ( name = dummy_file . name ) After Python import pathlib from google import genai client = genai . Client () pathlib . Path ( 'dummy.txt' ) . write_text ( dummy ) dummy_file = client . files . upload ( file = 'dummy.txt' ) response = client . files . delete ( name = dummy_file . name ) Context caching Context caching allows the user to pass the content to the model once, cache the input tokens, and then refer to the cached tokens in subsequent calls to lower the cost. Before Python import requests import pathlib import google.generativeai as genai from google.generativeai import caching # Download file response = requests . get ( 'https://storage.googleapis.com/generativeai-downloads/data/a11.txt' ) pathlib . Path ( 'a11.txt' ) . write_text ( response . text ) # Upload file document = genai . upload_file ( path = "a11.txt" ) # Create cache apollo_cache = caching . CachedContent . create ( model = "gemini-1.5-flash-001" , system_instruction = "You are an expert at analyzing transcripts." , contents = [ document ], ) # Generate response apollo_model = genai . GenerativeModel . from_cached_content ( cached_content = apollo_cache ) response = apollo_model . generate_content ( "Find a lighthearted moment from this transcript" ) JavaScript import { GoogleAICacheManager , GoogleAIFileManager } from "@google/generative-ai/server" ; import { GoogleGenerativeAI } from "@google/generative-ai" ; const cacheManager = new GoogleAICacheManager ( "GOOGLE_API_KEY" ); const fileManager = new GoogleAIFileManager ( "GOOGLE_API_KEY" ); const uploadResult = await fileManager . uploadFile ( "path/to/a11.txt" , { mimeType : "text/plain" , }); const cacheResult = await cacheManager . create ({ model : "models/gemini-1.5-flash" , contents : [ { role : "user" , parts : [ { fileData : { fileUri : uploadResult . file . uri , mimeType : uploadResult . file . mimeType , }, }, ], }, ], }); console . log ( cacheResult ); const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModelFromCachedContent ( cacheResult ); const result = await model . generateContent ( "Please summarize this transcript." , ); console . log ( result . response . text ()); After Python import requests import pathlib from google import genai from google.genai import types client = genai . Client () # Check which models support caching. for m in client . models . list (): for action in m . supported_actions : if action == "createCachedContent" : print ( m . name ) break # Download file response = requests . get ( 'https://storage.googleapis.com/generativeai-downloads/data/a11.txt' ) pathlib . Path ( 'a11.txt' ) . write_text ( response . text ) # Upload file document = client . files . upload ( file = 'a11.txt' ) # Create cache model = 'gemini-1.5-flash-001' apollo_cache = client . caches . create ( model = model , config = { 'contents' : [ document ], 'system_instruction' : 'You are an expert at analyzing transcripts.' , }, ) # Generate response response = client . models . generate_content ( model = model , contents = 'Find a lighthearted moment from this transcript' , config = types . GenerateContentConfig ( cached_content = apollo_cache . name , ) ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const filePath = path . join ( media , "a11.txt" ); const document = await ai . files . upload ({ file : filePath , config : { mimeType : "text/plain" }, }); console . log ( "Uploaded file name:" , document . name ); const modelName = "gemini-1.5-flash" ; const contents = [ createUserContent ( createPartFromUri ( document . uri , document . mimeType )), ]; const cache = await ai . caches . create ({ model : modelName , config : { contents : contents , systemInstruction : "You are an expert analyzing transcripts." , }, }); console . log ( "Cache created:" , cache ); const response = await ai . models . generateContent ({ model : modelName , contents : "Please summarize this transcript" , config : { cachedContent : cache . name }, }); console . log ( "Response text:" , response . text ); Count tokens Count the number of tokens in a request. Before Python import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . count_tokens ( 'The quick brown fox jumps over the lazy dog.' ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY+); const model = genAI.getGenerativeModel({ model: " gemini - 1.5 - flash ", }); // Count tokens in a prompt without calling text generation. const countResult = await model.countTokens( " The quick brown fox jumps over the lazy dog . ", ); console.log(countResult.totalTokens); // 11 const generateResult = await model.generateContent( " The quick brown fox jumps over the lazy dog . " , ); // On the response for `generateContent`, use `usageMetadata` // to get separate input and output token counts // (`promptTokenCount` and `candidatesTokenCount`, respectively), // as well as the combined token count (`totalTokenCount`). console . log ( generateResult . response . usageMetadata ); // candidatesTokenCount and totalTokenCount depend on response, may vary // { promptTokenCount: 11, candidatesTokenCount: 124, totalTokenCount: 135 } After Python from google import genai client = genai . Client () response = client . models . count_tokens ( model = 'gemini-2.0-flash' , contents = 'The quick brown fox jumps over the lazy dog.' , ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const prompt = "The quick brown fox jumps over the lazy dog." ; const countTokensResponse = await ai . models . countTokens ({ model : "gemini-2.0-flash" , contents : prompt , }); console . log ( countTokensResponse . totalTokens ); const generateResponse = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : prompt , }); console . log ( generateResponse . usageMetadata ); Generate images Generate images: Before Python #pip install https://github.com/google-gemini/generative-ai-python@imagen import google.generativeai as genai imagen = genai . ImageGenerationModel ( "imagen-3.0-generate-001" ) gen_images = imagen . generate_images ( prompt = "Robot holding a red skateboard" , number_of_images = 1 , safety_filter_level = "block_low_and_above" , person_generation = "allow_adult" , aspect_ratio = "3:4" , ) After Python from google import genai client = genai . Client () gen_images = client . models . generate_images ( model = 'imagen-3.0-generate-001' , prompt = 'Robot holding a red skateboard' , config = types . GenerateImagesConfig ( number_of_images = 1 , safety_filter_level = "BLOCK_LOW_AND_ABOVE" , person_generation = "ALLOW_ADULT" , aspect_ratio = "3:4" , ) ) for n , image in enumerate ( gen_images . generated_images ): pathlib . Path ( f ' { n } .png' ) . write_bytes ( image . image . image_bytes ) Embed content Generate content embeddings. Before Python import google.generativeai as genai response = genai . embed_content ( model = 'models/text-embedding-004' , content = 'Hello world' ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModel ({ model : "text-embedding-004" , }); const result = await model . embedContent ( "Hello world!" ); console . log ( result . embedding ); After Python from google import genai client = genai . Client () response = client . models . embed_content ( model = 'text-embedding-004' , contents = 'Hello world' , ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const text = "Hello World!" ; const result = await ai . models . embedContent ({ model : "text-embedding-004" , contents : text , config : { outputDimensionality : 10 }, }); console . log ( result . embeddings ); Tune a Model Create and use a tuned model. The new SDK simplifies tuning with client.tunings.tune , which launches the tuning job and polls until the job is complete. Before Python import google.generativeai as genai import random # create tuning model train_data = {} for i in range ( 1 , 6 ): key = f 'input { i } ' value = f 'output { i } ' train_data [ key ] = value name = f 'generate-num- { random . randint ( 0 , 10000 ) } ' operation = genai . create_tuned_model ( source_model = 'models/gemini-1.5-flash-001-tuning' , training_data = train_data , id = name , epoch_count = 5 , batch_size = 4 , learning_rate = 0.001 , ) # wait for tuning complete tuningProgress = operation . result () # generate content with the tuned model model = genai . GenerativeModel ( model_name = f 'tunedModels/ { name } ' ) response = model . generate_content ( '55' ) After Python from google import genai from google.genai import types client = genai . Client () # Check which models are available for tuning. for m in client . models . list (): for action in m . supported_actions : if action == "createTunedModel" : print ( m . name ) break # create tuning model training_dataset = types . TuningDataset ( examples = [ types . TuningExample ( text_input = f 'input { i } ' , output = f 'output { i } ' , ) for i in range ( 5 ) ], ) tuning_job = client . tunings . tune ( base_model = 'models/gemini-1.5-flash-001-tuning' , training_dataset = training_dataset , config = types . CreateTuningJobConfig ( epoch_count = 5 , batch_size = 4 , learning_rate = 0.001 , tuned_model_display_name = "test tuned model" ) ) # generate content with the tuned model response = client . models . generate_content ( model = tuning_job . tuned_model . model , contents = '55' , ) Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-09 UTC.
 
2
  Title: Migrate to the Google GenAI SDK  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Migrate to the Google GenAI SDK | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Migrate to the Google GenAI SDK Starting with the Gemini 2.0 release in late 2024, we introduced a new set of libraries called the Google GenAI SDK . It offers an improved developer experience through an updated client architecture , and simplifies the transition between developer and enterprise workflows. The Google GenAI SDK is now in General Availability (GA) across all supported platforms. If you're using one of our legacy libraries , we strongly recommend you to migrate. This guide provides before-and-after examples of migrated code to help you get started. Note: The Go examples omit imports and other boilerplate code to improve readability. Installation Before Python pip install -U -q "google-generativeai" JavaScript npm install @google/generative-ai Go go get github.com/google/generative-ai-go After Python pip install -U -q "google-genai" JavaScript npm install @google/genai Go go get google.golang.org/genai API access The old SDK implicitly handled the API client behind the scenes using a variety of ad hoc methods. This made it hard to manage the client and credentials. Now, you interact through a central Client object. This Client object acts as a single entry point for various API services (e.g., models , chats , files , tunings ), promoting consistency and simplifying credential and configuration management across different API calls. Before (Less Centralized API Access) Python The old SDK didn't explicitly use a top-level client object for most API calls. You would directly instantiate and interact with GenerativeModel objects. import google.generativeai as genai # Directly create and use model objects model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . generate_content ( ... ) chat = model . start_chat ( ... ) JavaScript While GoogleGenerativeAI was a central point for models and chat, other functionalities like file and cache management often required importing and instantiating entirely separate client classes. import { GoogleGenerativeAI } from "@google/generative-ai" ; import { GoogleAIFileManager , GoogleAICacheManager } from "@google/generative-ai/server" ; // For files/caching const genAI = new GoogleGenerativeAI ( "YOUR_API_KEY" ); const fileManager = new GoogleAIFileManager ( "YOUR_API_KEY" ); const cacheManager = new GoogleAICacheManager ( "YOUR_API_KEY" ); // Get a model instance, then call methods on it const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" }); const result = await model . generateContent (...); const chat = model . startChat (...); // Call methods on separate client objects for other services const uploadedFile = await fileManager . uploadFile (...); const cache = await cacheManager . create (...); Go The genai.NewClient function created a client, but generative model operations were typically called on a separate GenerativeModel instance obtained from this client. Other services might have been accessed via distinct packages or patterns. import ( "github.com/google/generative-ai-go/genai" "github.com/google/generative-ai-go/genai/fileman" // For files "google.golang.org/api/option" ) client , err := genai . NewClient ( ctx , option . WithAPIKey ( "YOUR_API_KEY" )) fileClient , err := fileman . NewClient ( ctx , option . WithAPIKey ( "YOUR_API_KEY" )) // Get a model instance, then call methods on it model := client . GenerativeModel ( "gemini-1.5-flash" ) resp , err := model . GenerateContent ( ... ) cs := model . StartChat () // Call methods on separate client objects for other services uploadedFile , err := fileClient . UploadFile ( ... ) After (Centralized Client Object) Python from google import genai # Create a single client object client = genai . Client () # Access API methods through services on the client object response = client . models . generate_content ( ... ) chat = client . chats . create ( ... ) my_file = client . files . upload ( ... ) tuning_job = client . tunings . tune ( ... ) JavaScript import { GoogleGenAI } from "@google/genai" ; // Create a single client object const ai = new GoogleGenAI ({ apiKey : "YOUR_API_KEY" }); // Access API methods through services on the client object const response = await ai . models . generateContent (...); const chat = ai . chats . create (...); const uploadedFile = await ai . files . upload (...); const cache = await ai . caches . create (...); Go import "google.golang.org/genai" // Create a single client object client , err := genai . NewClient ( ctx , nil ) // Access API methods through services on the client object result , err := client . Models . GenerateContent ( ... ) chat , err := client . Chats . Create ( ... ) uploadedFile , err := client . Files . Upload ( ... ) tuningJob , err := client . Tunings . Tune ( ... ) Authentication Both legacy and new libraries authenticate using API keys. You can create your API key in Google AI Studio. Before Python The old SDK handled the API client object implicitly. import google.generativeai as genai genai . configure ( api_key =... ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); Go Import the Google libraries: import ( "github.com/google/generative-ai-go/genai" "google.golang.org/api/option" ) Create the client: client , err := genai . NewClient ( ctx , option . WithAPIKey ( "GOOGLE_API_KEY" )) After Python With Google GenAI SDK, you create an API client first, which is used to call the API. The new SDK will pick up your API key from either one of the GEMINI_API_KEY or GOOGLE_API_KEY environment variables, if you don't pass one to the client. export GEMINI_API_KEY = "YOUR_API_KEY" from google import genai client = genai . Client () # Set the API key using the GEMINI_API_KEY env var. # Alternatively, you could set the API key explicitly: # client = genai.Client(api_key="your_api_key") JavaScript import { GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({ apiKey : "GEMINI_API_KEY" }); Go Import the GenAI library: import "google.golang.org/genai" Create the client: client , err := genai . NewClient ( ctx , & genai . ClientConfig { Backend : genai . BackendGeminiAPI , }) Generate content Text Before Python Previously, there were no client objects, you accessed APIs directly through GenerativeModel objects. import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . generate_content ( 'Tell me a story in 300 words' ) print ( response . text ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( process . env . API_KEY ); const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" }); const prompt = "Tell me a story in 300 words" ; const result = await model . generateContent ( prompt ); console . log ( result . response . text ()); Go ctx := context . Background () client , err := genai . NewClient ( ctx , option . WithAPIKey ( "GOOGLE_API_KEY" )) if err != nil { log . Fatal ( err ) } defer client . Close () model := client . GenerativeModel ( "gemini-1.5-flash" ) resp , err := model . GenerateContent ( ctx , genai . Text ( "Tell me a story in 300 words." )) if err != nil { log . Fatal ( err ) } printResponse ( resp ) // utility for printing response parts After Python The new Google GenAI SDK provides access to all the API methods through the Client object. Except for a few stateful special cases ( chat and live-api session s), these are all stateless functions. For utility and uniformity, objects returned are pydantic classes. from google import genai client = genai . Client () response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = 'Tell me a story in 300 words.' ) print ( response . text ) print ( response . model_dump_json ( exclude_none = True , indent = 4 )) JavaScript import { GoogleGenAI } from "@google/genai" ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const response = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : "Tell me a story in 300 words." , }); console . log ( response . text ); Go ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , err := client . Models . GenerateContent ( ctx , "gemini-2.0-flash" , genai . Text ( "Tell me a story in 300 words." ), nil ) if err != nil { log . Fatal ( err ) } debugPrint ( result ) // utility for printing result Image Before Python import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . generate_content ([ 'Tell me a story based on this image' , Image . open ( image_path ) ]) print ( response . text ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" }); function fileToGenerativePart ( path , mimeType ) { return { inlineData : { data : Buffer . from ( fs . readFileSync ( path )). toString ( "base64" ), mimeType , }, }; } const prompt = "Tell me a story based on this image" ; const imagePart = fileToGenerativePart ( `path/to/organ.jpg` , "image/jpeg" , ); const result = await model . generateContent ([ prompt , imagePart ]); console . log ( result . response . text ()); Go ctx := context . Background () client , err := genai . NewClient ( ctx , option . WithAPIKey ( "GOOGLE_API_KEY" )) if err != nil { log . Fatal ( err ) } defer client . Close () model := client . GenerativeModel ( "gemini-1.5-flash" ) imgData , err := os . ReadFile ( "path/to/organ.jpg" ) if err != nil { log . Fatal ( err ) } resp , err := model . GenerateContent ( ctx , genai . Text ( "Tell me about this instrument" ), genai . ImageData ( "jpeg" , imgData )) if err != nil { log . Fatal ( err ) } printResponse ( resp ) // utility for printing response After Python Many of the same convenience features exist in the new SDK. For example, PIL.Image objects are automatically converted. from google import genai from PIL import Image client = genai . Client () response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = [ 'Tell me a story based on this image' , Image . open ( image_path ) ] ) print ( response . text ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const organ = await ai . files . upload ({ file : "path/to/organ.jpg" , }); const response = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : [ createUserContent ([ "Tell me a story based on this image" , createPartFromUri ( organ . uri , organ . mimeType ) ]), ], }); console . log ( response . text ); Go ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } imgData , err := os . ReadFile ( "path/to/organ.jpg" ) if err != nil { log . Fatal ( err ) } parts := [] * genai . Part { { Text : "Tell me a story based on this image" }, { InlineData : & genai . Blob { Data : imgData , MIMEType : "image/jpeg" }}, } contents := [] * genai . Content { { Parts : parts }, } result , err := client . Models . GenerateContent ( ctx , "gemini-2.0-flash" , contents , nil ) if err != nil { log . Fatal ( err ) } debugPrint ( result ) // utility for printing result Streaming Before Python import google.generativeai as genai response = model . generate_content ( "Write a cute story about cats." , stream = True ) for chunk in response : print ( chunk . text ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" }); const prompt = "Write a story about a magic backpack." ; const result = await model . generateContentStream ( prompt ); // Print text as it comes in. for await ( const chunk of result . stream ) { const chunkText = chunk . text (); process . stdout . write ( chunkText ); } Go ctx := context . Background () client , err := genai . NewClient ( ctx , option . WithAPIKey ( "GOOGLE_API_KEY" )) if err != nil { log . Fatal ( err ) } defer client . Close () model := client . GenerativeModel ( "gemini-1.5-flash" ) iter := model . GenerateContentStream ( ctx , genai . Text ( "Write a story about a magic backpack." )) for { resp , err := iter . Next () if err == iterator . Done { break } if err != nil { log . Fatal ( err ) } printResponse ( resp ) // utility for printing the response } After Python from google import genai client = genai . Client () for chunk in client . models . generate_content_stream ( model = 'gemini-2.0-flash' , contents = 'Tell me a story in 300 words.' ): print ( chunk . text ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const response = await ai . models . generateContentStream ({ model : "gemini-2.0-flash" , contents : "Write a story about a magic backpack." , }); let text = "" ; for await ( const chunk of response ) { console . log ( chunk . text ); text += chunk . text ; } Go ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } for result , err := range client . Models . GenerateContentStream ( ctx , "gemini-2.0-flash" , genai . Text ( "Write a story about a magic backpack." ), nil , ) { if err != nil { log . Fatal ( err ) } fmt . Print ( result . Candidates [ 0 ]. Content . Parts [ 0 ]. Text ) } Configuration Before Python import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' , system_instruction = 'you are a story teller for kids under 5 years old' , generation_config = genai . GenerationConfig ( max_output_tokens = 400 , top_k = 2 , top_p = 0.5 , temperature = 0.5 , response_mime_type = 'application/json' , stop_sequences = [ ' \n ' ], ) ) response = model . generate_content ( 'tell me a story in 100 words' ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" , generationConfig : { candidateCount : 1 , stopSequences : [ "x" ], maxOutputTokens : 20 , temperature : 1.0 , }, }); const result = await model . generateContent ( "Tell me a story about a magic backpack." , ); console . log ( result . response . text ()) Go ctx := context . Background () client , err := genai . NewClient ( ctx , option . WithAPIKey ( "GOOGLE_API_KEY" )) if err != nil { log . Fatal ( err ) } defer client . Close () model := client . GenerativeModel ( "gemini-1.5-flash" ) model . SetTemperature ( 0.5 ) model . SetTopP ( 0.5 ) model . SetTopK ( 2.0 ) model . SetMaxOutputTokens ( 100 ) model . ResponseMIMEType = "application/json" resp , err := model . GenerateContent ( ctx , genai . Text ( "Tell me about New York" )) if err != nil { log . Fatal ( err ) } printResponse ( resp ) // utility for printing response After Python For all methods in the new SDK, the required arguments are provided as keyword arguments. All optional inputs are provided in the config argument. Config arguments can be specified as either Python dictionaries or Config classes in the google.genai.types namespace. For utility and uniformity, all definitions within the types module are pydantic classes. from google import genai from google.genai import types client = genai . Client () response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = 'Tell me a story in 100 words.' , config = types . GenerateContentConfig ( system_instruction = 'you are a story teller for kids under 5 years old' , max_output_tokens = 400 , top_k = 2 , top_p = 0.5 , temperature = 0.5 , response_mime_type = 'application/json' , stop_sequences = [ ' \n ' ], seed = 42 , ), ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const response = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : "Tell me a story about a magic backpack." , config : { candidateCount : 1 , stopSequences : [ "x" ], maxOutputTokens : 20 , temperature : 1.0 , }, }); console . log ( response . text ); Go ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } result , err := client . Models . GenerateContent ( ctx , "gemini-2.0-flash" , genai . Text ( "Tell me about New York" ), & genai . GenerateContentConfig { Temperature : genai . Ptr [ float32 ]( 0.5 ), TopP : genai . Ptr [ float32 ]( 0.5 ), TopK : genai . Ptr [ float32 ]( 2.0 ), ResponseMIMEType : "application/json" , StopSequences : [] string { "Yankees" }, CandidateCount : 2 , Seed : genai . Ptr [ int32 ]( 42 ), MaxOutputTokens : 128 , PresencePenalty : genai . Ptr [ float32 ]( 0.5 ), FrequencyPenalty : genai . Ptr [ float32 ]( 0.5 ), }, ) if err != nil { log . Fatal ( err ) } debugPrint ( result ) // utility for printing response Safety settings Generate a response with safety settings: Before Python import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . generate_content ( 'say something bad' , safety_settings = { 'HATE' : 'BLOCK_ONLY_HIGH' , 'HARASSMENT' : 'BLOCK_ONLY_HIGH' , } ) JavaScript import { GoogleGenerativeAI , HarmCategory , HarmBlockThreshold } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" , safetySettings : [ { category : HarmCategory . HARM_CATEGORY_HARASSMENT , threshold : HarmBlockThreshold . BLOCK_LOW_AND_ABOVE , }, ], }); const unsafePrompt = "I support Martians Soccer Club and I think " + "Jupiterians Football Club sucks! Write an ironic phrase telling " + "them how I feel about them." ; const result = await model . generateContent ( unsafePrompt ); try { result . response . text (); } catch ( e ) { console . error ( e ); console . log ( result . response . candidates [ 0 ]. safetyRatings ); } After Python from google import genai from google.genai import types client = genai . Client () response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = 'say something bad' , config = types . GenerateContentConfig ( safety_settings = [ types . SafetySetting ( category = 'HARM_CATEGORY_HATE_SPEECH' , threshold = 'BLOCK_ONLY_HIGH' ), ] ), ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const unsafePrompt = "I support Martians Soccer Club and I think " + "Jupiterians Football Club sucks! Write an ironic phrase telling " + "them how I feel about them." ; const response = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : unsafePrompt , config : { safetySettings : [ { category : "HARM_CATEGORY_HARASSMENT" , threshold : "BLOCK_ONLY_HIGH" , }, ], }, }); console . log ( "Finish reason:" , response . candidates [ 0 ]. finishReason ); console . log ( "Safety ratings:" , response . candidates [ 0 ]. safetyRatings ); Async Before Python import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . generate_content_async ( 'tell me a story in 100 words' ) After Python To use the new SDK with asyncio , there is a separate async implementation of every method under client.aio . from google import genai client = genai . Client () response = await client . aio . models . generate_content ( model = 'gemini-2.0-flash' , contents = 'Tell me a story in 300 words.' ) Chat Start a chat and send a message to the model: Before Python import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' ) chat = model . start_chat () response = chat . send_message ( "Tell me a story in 100 words" ) response = chat . send_message ( "What happened after that?" ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" }); const chat = model . startChat ({ history : [ { role : "user" , parts : [{ text : "Hello" }], }, { role : "model" , parts : [{ text : "Great to meet you. What would you like to know?" }], }, ], }); let result = await chat . sendMessage ( "I have 2 dogs in my house." ); console . log ( result . response . text ()); result = await chat . sendMessage ( "How many paws are in my house?" ); console . log ( result . response . text ()); Go ctx := context . Background () client , err := genai . NewClient ( ctx , option . WithAPIKey ( "GOOGLE_API_KEY" )) if err != nil { log . Fatal ( err ) } defer client . Close () model := client . GenerativeModel ( "gemini-1.5-flash" ) cs := model . StartChat () cs . History = [] * genai . Content { { Parts : [] genai . Part { genai . Text ( "Hello, I have 2 dogs in my house." ), }, Role : "user" , }, { Parts : [] genai . Part { genai . Text ( "Great to meet you. What would you like to know?" ), }, Role : "model" , }, } res , err := cs . SendMessage ( ctx , genai . Text ( "How many paws are in my house?" )) if err != nil { log . Fatal ( err ) } printResponse ( res ) // utility for printing the response After Python from google import genai client = genai . Client () chat = client . chats . create ( model = 'gemini-2.0-flash' ) response = chat . send_message ( message = 'Tell me a story in 100 words' ) response = chat . send_message ( message = 'What happened after that?' ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const chat = ai . chats . create ({ model : "gemini-2.0-flash" , history : [ { role : "user" , parts : [{ text : "Hello" }], }, { role : "model" , parts : [{ text : "Great to meet you. What would you like to know?" }], }, ], }); const response1 = await chat . sendMessage ({ message : "I have 2 dogs in my house." , }); console . log ( "Chat response 1:" , response1 . text ); const response2 = await chat . sendMessage ({ message : "How many paws are in my house?" , }); console . log ( "Chat response 2:" , response2 . text ); Go ctx := context . Background () client , err := genai . NewClient ( ctx , nil ) if err != nil { log . Fatal ( err ) } chat , err := client . Chats . Create ( ctx , "gemini-2.0-flash" , nil , nil ) if err != nil { log . Fatal ( err ) } result , err := chat . SendMessage ( ctx , genai . Part { Text : "Hello, I have 2 dogs in my house." }) if err != nil { log . Fatal ( err ) } debugPrint ( result ) // utility for printing result result , err = chat . SendMessage ( ctx , genai . Part { Text : "How many paws are in my house?" }) if err != nil { log . Fatal ( err ) } debugPrint ( result ) // utility for printing result Function calling Before Python import google.generativeai as genai from enum import Enum def get_current_weather ( location : str ) - > str : """Get the current whether in a given location. Args: location: required, The city and state, e.g. San Franciso, CA unit: celsius or fahrenheit """ print ( f 'Called with: { location =} ' ) return "23C" model = genai . GenerativeModel ( model_name = "gemini-1.5-flash" , tools = [ get_current_weather ] ) response = model . generate_content ( "What is the weather in San Francisco?" ) function_call = response . candidates [ 0 ] . parts [ 0 ] . function_call After Python In the new SDK, automatic function calling is the default. Here, you disable it. from google import genai from google.genai import types client = genai . Client () def get_current_weather ( location : str ) - > str : """Get the current whether in a given location. Args: location: required, The city and state, e.g. San Franciso, CA unit: celsius or fahrenheit """ print ( f 'Called with: { location =} ' ) return "23C" response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = "What is the weather like in Boston?" , config = types . GenerateContentConfig ( tools = [ get_current_weather ], automatic_function_calling = { 'disable' : True }, ), ) function_call = response . candidates [ 0 ] . content . parts [ 0 ] . function_call Automatic function calling Before Python The old SDK only supports automatic function calling in chat. In the new SDK this is the default behavior in generate_content . import google.generativeai as genai def get_current_weather ( city : str ) - > str : return "23C" model = genai . GenerativeModel ( model_name = "gemini-1.5-flash" , tools = [ get_current_weather ] ) chat = model . start_chat ( enable_automatic_function_calling = True ) result = chat . send_message ( "What is the weather in San Francisco?" ) After Python from google import genai from google.genai import types client = genai . Client () def get_current_weather ( city : str ) - > str : return "23C" response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = "What is the weather like in Boston?" , config = types . GenerateContentConfig ( tools = [ get_current_weather ] ), ) Code execution Code execution is a tool that allows the model to generate Python code, run it, and return the result. Before Python import google.generativeai as genai model = genai . GenerativeModel ( model_name = "gemini-1.5-flash" , tools = "code_execution" ) result = model . generate_content ( "What is the sum of the first 50 prime numbers? Generate and run code for " "the calculation, and make sure you get all 50." ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModel ({ model : "gemini-1.5-flash" , tools : [{ codeExecution : {} }], }); const result = await model . generateContent ( "What is the sum of the first 50 prime numbers? " + "Generate and run code for the calculation, and make sure you get " + "all 50." , ); console . log ( result . response . text ()); After Python from google import genai from google.genai import types client = genai . Client () response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = 'What is the sum of the first 50 prime numbers? Generate and run ' 'code for the calculation, and make sure you get all 50.' , config = types . GenerateContentConfig ( tools = [ types . Tool ( code_execution = types . ToolCodeExecution )], ), ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const response = await ai . models . generateContent ({ model : "gemini-2.0-pro-exp-02-05" , contents : `Write and execute code that calculates the sum of the first 50 prime numbers. Ensure that only the executable code and its resulting output are generated.` , }); // Each part may contain text, executable code, or an execution result. for ( const part of response . candidates [ 0 ]. content . parts ) { console . log ( part ); console . log ( "\n" ); } console . log ( "-" . repeat ( 80 )); // The `.text` accessor concatenates the parts into a markdown-formatted text. console . log ( "\n" , response . text ); Search grounding GoogleSearch (Gemini>=2.0) and GoogleSearchRetrieval (Gemini < 2.0) are tools that allow the model to retrieve public web data for grounding, powered by Google. Before Python import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . generate_content ( contents = "what is the Google stock price?" , tools = 'google_search_retrieval' ) After Python from google import genai from google.genai import types client = genai . Client () response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = 'What is the Google stock price?' , config = types . GenerateContentConfig ( tools = [ types . Tool ( google_search = types . GoogleSearch () ) ] ) ) JSON response Generate answers in JSON format. Before Python By specifying a response_schema and setting response_mime_type="application/json" users can constrain the model to produce a JSON response following a given structure. import google.generativeai as genai import typing_extensions as typing class CountryInfo ( typing . TypedDict ): name : str population : int capital : str continent : str major_cities : list [ str ] gdp : int official_language : str total_area_sq_mi : int model = genai . GenerativeModel ( model_name = "gemini-1.5-flash" ) result = model . generate_content ( "Give me information of the United States" , generation_config = genai . GenerationConfig ( response_mime_type = "application/json" , response_schema = CountryInfo ), ) JavaScript import { GoogleGenerativeAI , SchemaType } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const schema = { description : "List of recipes" , type : SchemaType . ARRAY , items : { type : SchemaType . OBJECT , properties : { recipeName : { type : SchemaType . STRING , description : "Name of the recipe" , nullable : false , }, }, required : [ "recipeName" ], }, }; const model = genAI . getGenerativeModel ({ model : "gemini-1.5-pro" , generationConfig : { responseMimeType : "application/json" , responseSchema : schema , }, }); const result = await model . generateContent ( "List a few popular cookie recipes." , ); console . log ( result . response . text ()); After Python The new SDK uses pydantic classes to provide the schema (although you can pass a genai.types.Schema , or equivalent dict ). When possible, the SDK will parse the returned JSON, and return the result in response.parsed . If you provided a pydantic class as the schema the SDK will convert that JSON to an instance of the class. from google import genai from pydantic import BaseModel client = genai . Client () class CountryInfo ( BaseModel ): name : str population : int capital : str continent : str major_cities : list [ str ] gdp : int official_language : str total_area_sq_mi : int response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = 'Give me information of the United States.' , config = { 'response_mime_type' : 'application/json' , 'response_schema' : CountryInfo , }, ) response . parsed JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const response = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : "List a few popular cookie recipes." , config : { responseMimeType : "application/json" , responseSchema : { type : "array" , items : { type : "object" , properties : { recipeName : { type : "string" }, ingredients : { type : "array" , items : { type : "string" } }, }, required : [ "recipeName" , "ingredients" ], }, }, }, }); console . log ( response . text ); Files Upload Upload a file: Before Python import requests import pathlib import google.generativeai as genai # Download file response = requests . get ( 'https://storage.googleapis.com/generativeai-downloads/data/a11.txt' ) pathlib . Path ( 'a11.txt' ) . write_text ( response . text ) file = genai . upload_file ( path = 'a11.txt' ) model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . generate_content ([ 'Can you summarize this file:' , my_file ]) print ( response . text ) After Python import requests import pathlib from google import genai client = genai . Client () # Download file response = requests . get ( 'https://storage.googleapis.com/generativeai-downloads/data/a11.txt' ) pathlib . Path ( 'a11.txt' ) . write_text ( response . text ) my_file = client . files . upload ( file = 'a11.txt' ) response = client . models . generate_content ( model = 'gemini-2.0-flash' , contents = [ 'Can you summarize this file:' , my_file ] ) print ( response . text ) List and get List uploaded files and get an uploaded file with a filename: Before Python import google.generativeai as genai for file in genai . list_files (): print ( file . name ) file = genai . get_file ( name = file . name ) After Python from google import genai client = genai . Client () for file in client . files . list (): print ( file . name ) file = client . files . get ( name = file . name ) Delete Delete a file: Before Python import pathlib import google.generativeai as genai pathlib . Path ( 'dummy.txt' ) . write_text ( dummy ) dummy_file = genai . upload_file ( path = 'dummy.txt' ) file = genai . delete_file ( name = dummy_file . name ) After Python import pathlib from google import genai client = genai . Client () pathlib . Path ( 'dummy.txt' ) . write_text ( dummy ) dummy_file = client . files . upload ( file = 'dummy.txt' ) response = client . files . delete ( name = dummy_file . name ) Context caching Context caching allows the user to pass the content to the model once, cache the input tokens, and then refer to the cached tokens in subsequent calls to lower the cost. Before Python import requests import pathlib import google.generativeai as genai from google.generativeai import caching # Download file response = requests . get ( 'https://storage.googleapis.com/generativeai-downloads/data/a11.txt' ) pathlib . Path ( 'a11.txt' ) . write_text ( response . text ) # Upload file document = genai . upload_file ( path = "a11.txt" ) # Create cache apollo_cache = caching . CachedContent . create ( model = "gemini-1.5-flash-001" , system_instruction = "You are an expert at analyzing transcripts." , contents = [ document ], ) # Generate response apollo_model = genai . GenerativeModel . from_cached_content ( cached_content = apollo_cache ) response = apollo_model . generate_content ( "Find a lighthearted moment from this transcript" ) JavaScript import { GoogleAICacheManager , GoogleAIFileManager } from "@google/generative-ai/server" ; import { GoogleGenerativeAI } from "@google/generative-ai" ; const cacheManager = new GoogleAICacheManager ( "GOOGLE_API_KEY" ); const fileManager = new GoogleAIFileManager ( "GOOGLE_API_KEY" ); const uploadResult = await fileManager . uploadFile ( "path/to/a11.txt" , { mimeType : "text/plain" , }); const cacheResult = await cacheManager . create ({ model : "models/gemini-1.5-flash" , contents : [ { role : "user" , parts : [ { fileData : { fileUri : uploadResult . file . uri , mimeType : uploadResult . file . mimeType , }, }, ], }, ], }); console . log ( cacheResult ); const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModelFromCachedContent ( cacheResult ); const result = await model . generateContent ( "Please summarize this transcript." , ); console . log ( result . response . text ()); After Python import requests import pathlib from google import genai from google.genai import types client = genai . Client () # Check which models support caching. for m in client . models . list (): for action in m . supported_actions : if action == "createCachedContent" : print ( m . name ) break # Download file response = requests . get ( 'https://storage.googleapis.com/generativeai-downloads/data/a11.txt' ) pathlib . Path ( 'a11.txt' ) . write_text ( response . text ) # Upload file document = client . files . upload ( file = 'a11.txt' ) # Create cache model = 'gemini-1.5-flash-001' apollo_cache = client . caches . create ( model = model , config = { 'contents' : [ document ], 'system_instruction' : 'You are an expert at analyzing transcripts.' , }, ) # Generate response response = client . models . generate_content ( model = model , contents = 'Find a lighthearted moment from this transcript' , config = types . GenerateContentConfig ( cached_content = apollo_cache . name , ) ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const filePath = path . join ( media , "a11.txt" ); const document = await ai . files . upload ({ file : filePath , config : { mimeType : "text/plain" }, }); console . log ( "Uploaded file name:" , document . name ); const modelName = "gemini-1.5-flash" ; const contents = [ createUserContent ( createPartFromUri ( document . uri , document . mimeType )), ]; const cache = await ai . caches . create ({ model : modelName , config : { contents : contents , systemInstruction : "You are an expert analyzing transcripts." , }, }); console . log ( "Cache created:" , cache ); const response = await ai . models . generateContent ({ model : modelName , contents : "Please summarize this transcript" , config : { cachedContent : cache . name }, }); console . log ( "Response text:" , response . text ); Count tokens Count the number of tokens in a request. Before Python import google.generativeai as genai model = genai . GenerativeModel ( 'gemini-1.5-flash' ) response = model . count_tokens ( 'The quick brown fox jumps over the lazy dog.' ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY+); const model = genAI.getGenerativeModel({ model: " gemini - 1.5 - flash ", }); // Count tokens in a prompt without calling text generation. const countResult = await model.countTokens( " The quick brown fox jumps over the lazy dog . ", ); console.log(countResult.totalTokens); // 11 const generateResult = await model.generateContent( " The quick brown fox jumps over the lazy dog . " , ); // On the response for `generateContent`, use `usageMetadata` // to get separate input and output token counts // (`promptTokenCount` and `candidatesTokenCount`, respectively), // as well as the combined token count (`totalTokenCount`). console . log ( generateResult . response . usageMetadata ); // candidatesTokenCount and totalTokenCount depend on response, may vary // { promptTokenCount: 11, candidatesTokenCount: 124, totalTokenCount: 135 } After Python from google import genai client = genai . Client () response = client . models . count_tokens ( model = 'gemini-2.0-flash' , contents = 'The quick brown fox jumps over the lazy dog.' , ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const prompt = "The quick brown fox jumps over the lazy dog." ; const countTokensResponse = await ai . models . countTokens ({ model : "gemini-2.0-flash" , contents : prompt , }); console . log ( countTokensResponse . totalTokens ); const generateResponse = await ai . models . generateContent ({ model : "gemini-2.0-flash" , contents : prompt , }); console . log ( generateResponse . usageMetadata ); Generate images Generate images: Before Python #pip install https://github.com/google-gemini/generative-ai-python@imagen import google.generativeai as genai imagen = genai . ImageGenerationModel ( "imagen-3.0-generate-001" ) gen_images = imagen . generate_images ( prompt = "Robot holding a red skateboard" , number_of_images = 1 , safety_filter_level = "block_low_and_above" , person_generation = "allow_adult" , aspect_ratio = "3:4" , ) After Python from google import genai client = genai . Client () gen_images = client . models . generate_images ( model = 'imagen-3.0-generate-001' , prompt = 'Robot holding a red skateboard' , config = types . GenerateImagesConfig ( number_of_images = 1 , safety_filter_level = "BLOCK_LOW_AND_ABOVE" , person_generation = "ALLOW_ADULT" , aspect_ratio = "3:4" , ) ) for n , image in enumerate ( gen_images . generated_images ): pathlib . Path ( f ' { n } .png' ) . write_bytes ( image . image . image_bytes ) Embed content Generate content embeddings. Before Python import google.generativeai as genai response = genai . embed_content ( model = 'models/gemini-embedding-001' , content = 'Hello world' ) JavaScript import { GoogleGenerativeAI } from "@google/generative-ai" ; const genAI = new GoogleGenerativeAI ( "GOOGLE_API_KEY" ); const model = genAI . getGenerativeModel ({ model : "gemini-embedding-001" , }); const result = await model . embedContent ( "Hello world!" ); console . log ( result . embedding ); After Python from google import genai client = genai . Client () response = client . models . embed_content ( model = 'gemini-embedding-001' , contents = 'Hello world' , ) JavaScript import { GoogleGenAI } from '@google/genai' ; const ai = new GoogleGenAI ({ apiKey : "GOOGLE_API_KEY" }); const text = "Hello World!" ; const result = await ai . models . embedContent ({ model : "gemini-embedding-001" , contents : text , config : { outputDimensionality : 10 }, }); console . log ( result . embeddings ); Tune a Model Create and use a tuned model. The new SDK simplifies tuning with client.tunings.tune , which launches the tuning job and polls until the job is complete. Before Python import google.generativeai as genai import random # create tuning model train_data = {} for i in range ( 1 , 6 ): key = f 'input { i } ' value = f 'output { i } ' train_data [ key ] = value name = f 'generate-num- { random . randint ( 0 , 10000 ) } ' operation = genai . create_tuned_model ( source_model = 'models/gemini-1.5-flash-001-tuning' , training_data = train_data , id = name , epoch_count = 5 , batch_size = 4 , learning_rate = 0.001 , ) # wait for tuning complete tuningProgress = operation . result () # generate content with the tuned model model = genai . GenerativeModel ( model_name = f 'tunedModels/ { name } ' ) response = model . generate_content ( '55' ) After Python from google import genai from google.genai import types client = genai . Client () # Check which models are available for tuning. for m in client . models . list (): for action in m . supported_actions : if action == "createTunedModel" : print ( m . name ) break # create tuning model training_dataset = types . TuningDataset ( examples = [ types . TuningExample ( text_input = f 'input { i } ' , output = f 'output { i } ' , ) for i in range ( 5 ) ], ) tuning_job = client . tunings . tune ( base_model = 'models/gemini-1.5-flash-001-tuning' , training_dataset = training_dataset , config = types . CreateTuningJobConfig ( epoch_count = 5 , batch_size = 4 , learning_rate = 0.001 , tuned_model_display_name = "test tuned model" ) ) # generate content with the tuned model response = client . models . generate_content ( model = tuning_job . tuned_model . model , contents = '55' , ) Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_model-tuning_41c0a534.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/model-tuning
2
  Title: Fine-tuning with the Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Fine-tuning with the Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Fine-tuning with the Gemini API With the deprecation of Gemini 1.5 Flash-001 in May 2025, we no longer have a model available which supports fine-tuning in the Gemini API, but it is supported in Vertex AI . We plan to bring fine-tuning support back in the future. We would love to hear from you on our developer forum if fine-tuning is important to your use case. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-06-30 UTC.
 
2
  Title: Fine-tuning with the Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Fine-tuning with the Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Fine-tuning with the Gemini API With the deprecation of Gemini 1.5 Flash-001 in May 2025, we no longer have a model available which supports fine-tuning in the Gemini API, but it is supported in Vertex AI . We plan to bring fine-tuning support back in the future. We would love to hear from you on our developer forum if fine-tuning is important to your use case. Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-08-21 UTC.
text_content/docs_models_110171ae.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ URL: https://ai.google.dev/gemini-api/docs/models#gemini-2.5-flash-image-preview
2
+ Title: Gemini models  |  Gemini API  |  Google AI for Developers
3
+ ==================================================
4
+
5
+ Gemini models | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Gemini models 2.5 Pro spark Our most powerful thinking model with maximum response accuracy and state-of-the-art performance Input audio, images, video, and text, get text responses Tackle difficult problems, analyze large databases, and more Best for complex coding, reasoning, and multimodal understanding 2.5 Flash spark Our best model in terms of price-performance, offering well-rounded capabilities. Input audio, images, video, and text, and get text responses Model thinks as needed; or, you can configure a thinking budget Best for low latency, high volume tasks that require thinking 2.5 Flash-Lite spark A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Input audio, images, video, and text, and get text responses Most cost-efficient model supporting high throughput Best for real time, low latency use cases Note: Gemini 2.5 Pro and 2.5 Flash come with thinking on by default . If you're migrating from a non-thinking model such as 2.0 Pro or Flash, we recommend you to review the Thinking guide first. Model variants The Gemini API offers different models that are optimized for specific use cases. Here's a brief overview of Gemini variants that are available: Model variant Input(s) Output Optimized for Gemini 2.5 Pro gemini-2.5-pro Audio, images, videos, text, and PDF Text Enhanced thinking and reasoning, multimodal understanding, advanced coding, and more Gemini 2.5 Flash gemini-2.5-flash Audio, images, videos, and text Text Adaptive thinking, cost efficiency Gemini 2.5 Flash-Lite gemini-2.5-flash-lite Text, image, video, audio Text Most cost-efficient model supporting high throughput Gemini 2.5 Flash Live gemini-live-2.5-flash-preview Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions Gemini 2.5 Flash Native Audio gemini-2.5-flash-preview-native-audio-dialog & gemini-2.5-flash-exp-native-audio-thinking-dialog Audio, videos, and text Text and audio, interleaved High quality, natural conversational audio outputs, with or without thinking Gemini 2.5 Flash Image Preview gemini-2.5-flash-image-preview Images and text Images and text Precise, conversational image generation and editing Gemini 2.5 Flash Preview TTS gemini-2.5-flash-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.5 Pro Preview TTS gemini-2.5-pro-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.0 Flash gemini-2.0-flash Audio, images, videos, and text Text Next generation features, speed, and realtime streaming. Gemini 2.0 Flash Preview Image Generation gemini-2.0-flash-preview-image-generation Audio, images, videos, and text Text, images Conversational image generation and editing Gemini 2.0 Flash-Lite gemini-2.0-flash-lite Audio, images, videos, and text Text Cost efficiency and low latency Gemini 2.0 Flash Live gemini-2.0-flash-live-001 Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions Gemini 1.5 Flash gemini-1.5-flash Audio, images, videos, and text Text Fast and versatile performance across a diverse variety of tasks Deprecated Gemini 1.5 Flash-8B gemini-1.5-flash-8b Audio, images, videos, and text Text High volume and lower intelligence tasks Deprecated Gemini 1.5 Pro gemini-1.5-pro Audio, images, videos, and text Text Complex reasoning tasks requiring more intelligence Deprecated You can view the rate limits for each model on the rate limits page . Gemini 2.5 Pro Gemini 2.5 Pro is our state-of-the-art thinking model, capable of reasoning over complex problems in code, math, and STEM, as well as analyzing large datasets, codebases, and documents using long context. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-pro save Supported data types Inputs Audio, images, video, text, and PDF Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Structured outputs Supported Caching Supported Function calling Supported Code execution Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported Batch API Supported URL Context Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-pro calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Our best model in terms of price-performance, offering well-rounded capabilities. 2.5 Flash is best for large scale processing, low-latency, high volume tasks that require thinking, and agentic use cases. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-flash save Supported data types Inputs Text, images, video, audio Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Audio generation Not supported Caching Supported Code execution Supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Supported Thinking Supported Batch API Supported URL Context Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-flash Preview: gemini-2.5-flash-preview-05-20 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash-Lite A Gemini 2.5 Flash model optimized for cost-efficiency and high throughput. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-flash-lite save Supported data types Inputs Text, image, video, audio, PDF Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Structured outputs Supported Caching Supported Function calling Supported Code execution Supported URL Context Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported Batch API Supported URL Context Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-flash-lite Preview: gemini-2.5-flash-lite-06-17 calendar_month Latest update July 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Live The Gemini 2.5 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code gemini-live-2.5-flash-preview save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported Batch API Not supported URL context Supported 123 Versions Read the model version patterns for more details. Preview: gemini-live-2.5-flash-preview calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Native Audio Our native audio dialog models, with and without thinking, available through the Live API . These models provide interactive and unstructured conversational experiences, with style and control prompting. Try native audio in Google AI Studio Model details Property Description id_card Model code gemini-2.5-flash-preview-native-audio-dialog & gemini-2.5-flash-exp-native-audio-thinking-dialog save Supported data types Inputs Audio, video, text Output Audio and text token_auto Token limits [*] Input token limit 128,000 Output token limit 8,000 handyman Capabilities Audio generation Supported Caching Not supported Code execution Not supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Not supported Thinking Supported Batch API Not supported Tuning Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-preview-05-20 Experimental: gemini-2.5-flash-exp-native-audio-thinking-dialog calendar_month Latest update May 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Image Preview Gemini 2.5 Flash Image Preview is our latest, fastest, and most efficient natively multimodal model that lets you generate and edit images conversationally. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-flash-image-preview save Supported data types Inputs Images and text Output Images and text token_auto Token limits [*] Input token limit 32,768 Output token limit 32,768 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Not supported Code execution Not Supported Search Not Supported Image generation Supported Audio generation Not supported Live API Not Supported Thinking Not Supported Batch API Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-image-preview calendar_month Latest update August 2025 cognition_2 Knowledge cutoff June 2025 Gemini 2.5 Flash Preview Text-to-Speech Gemini 2.5 Flash Preview TTS is our price-performant text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Flash rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-flash-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported Batch API Supported 123 Versions Read the model version patterns for more details. gemini-2.5-flash-preview-tts calendar_month Latest update May 2025 Gemini 2.5 Pro Preview Text-to-Speech Gemini 2.5 Pro Preview TTS is our most powerful text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Pro rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-pro-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported Batch API Supported 123 Versions Read the model version patterns for more details. gemini-2.5-pro-preview-tts calendar_month Latest update May 2025 Gemini 2.0 Flash Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, native tool use, and a 1M token context window. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.0-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Not supported Live API Supported Thinking Experimental Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash Stable: gemini-2.0-flash-001 Experimental: gemini-2.0-flash-exp calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash Preview Image Generation Gemini 2.0 Flash Preview Image Generation delivers improved image generation features, including generating and editing images conversationally. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.0-flash-preview-image-generation save Supported data types Inputs Audio, images, video, and text Output Text and images token_auto Token limits [*] Input token limit 32,000 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Not supported Code execution Not Supported Search Not Supported Image generation Supported Audio generation Not supported Live API Not Supported Thinking Not Supported Batch API Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-preview-image-generation gemini-2.0-flash-preview-image-generation is not currently supported in a number of countries in Europe, Middle East & Africa calendar_month Latest update May 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash-Lite A Gemini 2.0 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.0-flash-lite save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Not supported Search Not supported Image generation Not supported Audio generation Not supported Live API Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash-lite Stable: gemini-2.0-flash-lite-001 calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash Live The Gemini 2.0 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.0-flash-live-001 save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported Batch API Not supported URL context Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-live-001 calendar_month Latest update April 2025 cognition_2 Knowledge cutoff August 2024 Gemini 1.5 Flash Gemini 1.5 Flash is a fast and versatile multimodal model for scaling across diverse tasks. Model details Property Description id_card Model code gemini-1.5-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-latest Latest stable: gemini-1.5-flash Stable: gemini-1.5-flash-001 gemini-1.5-flash-002 calendar_month Deprecation date September 2025 calendar_month Latest update September 2024 Gemini 1.5 Flash-8B Gemini 1.5 Flash-8B is a small model designed for lower intelligence tasks. Model details Property Description id_card Model code gemini-1.5-flash-8b save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-8b-latest Latest stable: gemini-1.5-flash-8b Stable: gemini-1.5-flash-8b-001 calendar_month Deprecation date September 2025 calendar_month Latest update October 2024 Gemini 1.5 Pro Try Gemini 2.5 Pro Preview , our most advanced Gemini model to date. Gemini 1.5 Pro is a mid-size multimodal model that is optimized for a wide-range of reasoning tasks. 1.5 Pro can process large amounts of data at once, including 2 hours of video, 19 hours of audio, codebases with 60,000 lines of code, or 2,000 pages of text. Model details Property Description id_card Model code gemini-1.5-pro save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 2,097,152 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 7,200 Maximum video length 2 hours Maximum audio length Approximately 19 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-pro-latest Latest stable: gemini-1.5-pro Stable: gemini-1.5-pro-001 gemini-1.5-pro-002 calendar_month Deprecation date September 2025 calendar_month Latest update September 2024 See the examples to explore the capabilities of these model variations. [*] A token is equivalent to about 4 characters for Gemini models. 100 tokens are about 60-80 English words. Model version name patterns Gemini models are available in either stable , preview , or experimental versions. In your code, you can use one of the following model name formats to specify which model and version you want to use. Latest stable Points to the most recent stable version released for the specified model generation and variation. To specify the latest stable version, use the following pattern: <model>-<generation>-<variation> . For example, gemini-2.0-flash . Stable Points to a specific stable model. Stable models usually don't change. Most production apps should use a specific stable model. To specify a stable version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-flash-001 . Preview Points to a preview model which may not be suitable for production use, come with more restrictive rate limits, but may have billing enabled. To specify a preview version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.5-pro-preview-06-05 . Preview models are not stable and availability of model endpoints is subject to change. Experimental Points to an experimental model which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback and get our latest updates into the hands of developers quickly. To specify an experimental version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-pro-exp-02-05 . Experimental models are not stable and availability of model endpoints is subject to change. Experimental models In addition to stable models, the Gemini API offers experimental models which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback, get our latest updates into the hands of developers quickly, and highlight the pace of innovation happening at Google. What we learn from experimental launches informs how we release models more widely. An experimental model can be swapped for another without prior notice. We don't guarantee that an experimental model will become a stable model in the future. Previous experimental models As new versions or stable releases become available, we remove and replace experimental models. You can find the previous experimental models we released in the following section along with the replacement version: Model code Base model Replacement version gemini-embedding-exp-03-07 Gemini Embedding gemini-embedding-001 gemini-2.5-flash-preview-04-17 Gemini 2.5 Flash gemini-2.5-flash-preview-05-20 gemini-2.0-flash-exp-image-generation Gemini 2.0 Flash gemini-2.0-flash-preview-image-generation gemini-2.5-pro-preview-06-05 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-05-06 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-03-25 Gemini 2.5 Pro gemini-2.5-pro gemini-2.0-flash-thinking-exp-01-21 Gemini 2.5 Flash gemini-2.5-flash-preview-04-17 gemini-2.0-pro-exp-02-05 Gemini 2.0 Pro Experimental gemini-2.5-pro-preview-03-25 gemini-2.0-flash-exp Gemini 2.0 Flash gemini-2.0-flash gemini-exp-1206 Gemini 2.0 Pro gemini-2.0-pro-exp-02-05 gemini-2.0-flash-thinking-exp-1219 Gemini 2.0 Flash Thinking gemini-2.0-flash-thinking-exp-01-21 gemini-exp-1121 Gemini gemini-exp-1206 gemini-exp-1114 Gemini gemini-exp-1206 gemini-1.5-pro-exp-0827 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-pro-exp-0801 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-flash-8b-exp-0924 Gemini 1.5 Flash-8B gemini-1.5-flash-8b gemini-1.5-flash-8b-exp-0827 Gemini 1.5 Flash-8B gemini-1.5-flash-8b Supported languages Gemini models are trained to work with the following languages: Arabic ( ar ) Bengali ( bn ) Bulgarian ( bg ) Chinese simplified and traditional ( zh ) Croatian ( hr ) Czech ( cs ) Danish ( da ) Dutch ( nl ) English ( en ) Estonian ( et ) Finnish ( fi ) French ( fr ) German ( de ) Greek ( el ) Hebrew ( iw ) Hindi ( hi ) Hungarian ( hu ) Indonesian ( id ) Italian ( it ) Japanese ( ja ) Korean ( ko ) Latvian ( lv ) Lithuanian ( lt ) Norwegian ( no ) Polish ( pl ) Portuguese ( pt ) Romanian ( ro ) Russian ( ru ) Serbian ( sr ) Slovak ( sk ) Slovenian ( sl ) Spanish ( es ) Swahili ( sw ) Swedish ( sv ) Thai ( th ) Turkish ( tr ) Ukrainian ( uk ) Vietnamese ( vi ) Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-09-14 UTC.
text_content/docs_models_1a9a1694.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/models#gemini-1.5-pro
2
  Title: Gemini models  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Gemini models | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Gemini models 2.5 Pro spark Our most powerful thinking model with maximum response accuracy and state-of-the-art performance Input audio, images, video, and text, get text responses Tackle difficult problems, analyze large databases, and more Best for complex coding, reasoning, and multimodal understanding 2.5 Flash spark Our best model in terms of price-performance, offering well-rounded capabilities. Input audio, images, video, and text, and get text responses Model thinks as needed; or, you can configure a thinking budget Best for low latency, high volume tasks that require thinking 2.5 Flash-Lite experiment A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Input audio, images, video, and text, and get text responses Most cost-efficient model supporting high throughput Best for real time, low latency use cases Note: Gemini 2.5 Pro and 2.5 Flash come with thinking on by default . If you're migrating from a non-thinking model such as 2.0 Pro or Flash, we recommend you to review the Thinking guide first. Model variants The Gemini API offers different models that are optimized for specific use cases. Here's a brief overview of Gemini variants that are available: Model variant Input(s) Output Optimized for Gemini 2.5 Pro gemini-2.5-pro Audio, images, videos, text, and PDF Text Enhanced thinking and reasoning, multimodal understanding, advanced coding, and more Gemini 2.5 Flash gemini-2.5-flash Audio, images, videos, and text Text Adaptive thinking, cost efficiency Gemini 2.5 Flash-Lite Preview gemini-2.5-flash-lite-preview-06-17 Text, image, video, audio Text Most cost-efficient model supporting high throughput Gemini 2.5 Flash Native Audio gemini-2.5-flash-preview-native-audio-dialog & gemini-2.5-flash-exp-native-audio-thinking-dialog Audio, videos, and text Text and audio, interleaved High quality, natural conversational audio outputs, with or without thinking Gemini 2.5 Flash Preview TTS gemini-2.5-flash-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.5 Pro Preview TTS gemini-2.5-pro-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.0 Flash gemini-2.0-flash Audio, images, videos, and text Text Next generation features, speed, and realtime streaming. Gemini 2.0 Flash Preview Image Generation gemini-2.0-flash-preview-image-generation Audio, images, videos, and text Text, images Conversational image generation and editing Gemini 2.0 Flash-Lite gemini-2.0-flash-lite Audio, images, videos, and text Text Cost efficiency and low latency Gemini 1.5 Flash gemini-1.5-flash Audio, images, videos, and text Text Fast and versatile performance across a diverse variety of tasks Gemini 1.5 Flash-8B gemini-1.5-flash-8b Audio, images, videos, and text Text High volume and lower intelligence tasks Gemini 1.5 Pro gemini-1.5-pro Audio, images, videos, and text Text Complex reasoning tasks requiring more intelligence Gemini Embedding gemini-embedding-exp Text Text embeddings Measuring the relatedness of text strings Imagen 4 imagen-4.0-generate-preview-06-06 imagen-4.0-ultra-generate-preview-06-06 Text Images Our most up-to-date image generation model Imagen 3 imagen-3.0-generate-002 Text Images High quality image generation model Veo 2 veo-2.0-generate-001 Text, images Video High quality video generation Gemini 2.5 Flash Live gemini-live-2.5-flash-preview Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions Gemini 2.0 Flash Live gemini-2.0-flash-live-001 Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions You can view the rate limits for each model on the rate limits page . Gemini 2.5 Pro Gemini 2.5 Pro is our state-of-the-art thinking model, capable of reasoning over complex problems in code, math, and STEM, as well as analyzing large datasets, codebases, and documents using long context. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-pro save Supported data types Inputs Audio, images, video, text, and PDF Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported Batch API Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-pro Preview: gemini-2.5-pro-preview-06-05 Preview: gemini-2.5-pro-preview-05-06 Preview: gemini-2.5-pro-preview-03-25 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Our best model in terms of price-performance, offering well-rounded capabilities. 2.5 Flash is best for large scale processing, low-latency, high volume tasks that require thinking, and agentic use cases. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash save Supported data types Inputs Text, images, video, audio Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Audio generation Not supported Caching Supported Code execution Supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Supported Thinking Supported Tuning Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-flash Preview: gemini-2.5-flash-preview-05-20 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash-Lite Preview A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-lite-preview-06-17 save Supported data types Inputs Text, images, video, and audio Output Text token_auto Token limits [*] Input token limit 1,000,000 Output token limit 64,000 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported URL Context Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-lite-preview-06-17 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Native Audio Our native audio dialog models, with and without thinking, available through the Live API . These models provide interactive and unstructured conversational experiences, with style and control prompting. Try native audio in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-preview-native-audio-dialog & models/gemini-2.5-flash-exp-native-audio-thinking-dialog save Supported data types Inputs Audio, video, text Output Audio and text token_auto Token limits [*] Input token limit 128,000 Output token limit 8,000 handyman Capabilities Audio generation Supported Caching Not supported Code execution Not supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Not supported Thinking Supported Tuning Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-preview-05-20 Experimental: gemini-2.5-flash-exp-native-audio-thinking-dialog calendar_month Latest update May 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Preview Text-to-Speech Gemini 2.5 Flash Preview TTS is our price-performant text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Flash rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported 123 Versions Read the model version patterns for more details. gemini-2.5-flash-preview-tts calendar_month Latest update May 2025 Gemini 2.5 Pro Preview Text-to-Speech Gemini 2.5 Pro Preview TTS is our most powerful text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Pro rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-pro-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported 123 Versions Read the model version patterns for more details. gemini-2.5-pro-preview-tts calendar_month Latest update May 2025 Gemini 2.0 Flash Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, native tool use, and a 1M token context window. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Not supported Live API Supported Thinking Experimental Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash Stable: gemini-2.0-flash-001 Experimental: gemini-2.0-flash-exp calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash Preview Image Generation Gemini 2.0 Flash Preview Image Generation delivers improved image generation features, including generating and editing images conversationally. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-preview-image-generation save Supported data types Inputs Audio, images, video, and text Output Text and images token_auto Token limits [*] Input token limit 32,000 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Not supported Code execution Not Supported Search Not Supported Image generation Supported Audio generation Not supported Live API Not Supported Thinking Not Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-preview-image-generation gemini-2.0-flash-preview-image-generation is not currently supported in a number of countries in Europe, Middle East & Africa calendar_month Latest update May 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash-Lite A Gemini 2.0 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-lite save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Not supported Search Not supported Image generation Not supported Audio generation Not supported Live API Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash-lite Stable: gemini-2.0-flash-lite-001 calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 1.5 Flash Gemini 1.5 Flash is a fast and versatile multimodal model for scaling across diverse tasks. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-latest Latest stable: gemini-1.5-flash Stable: gemini-1.5-flash-001 gemini-1.5-flash-002 calendar_month Latest update September 2024 Gemini 1.5 Flash-8B Gemini 1.5 Flash-8B is a small model designed for lower intelligence tasks. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-flash-8b save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-8b-latest Latest stable: gemini-1.5-flash-8b Stable: gemini-1.5-flash-8b-001 calendar_month Latest update October 2024 Gemini 1.5 Pro Try Gemini 2.5 Pro Preview , our most advanced Gemini model to date. Gemini 1.5 Pro is a mid-size multimodal model that is optimized for a wide-range of reasoning tasks. 1.5 Pro can process large amounts of data at once, including 2 hours of video, 19 hours of audio, codebases with 60,000 lines of code, or 2,000 pages of text. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-pro save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 2,097,152 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 7,200 Maximum video length 2 hours Maximum audio length Approximately 19 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-pro-latest Latest stable: gemini-1.5-pro Stable: gemini-1.5-pro-001 gemini-1.5-pro-002 calendar_month Latest update September 2024 Imagen 4 Imagen 4 is our latest image model, capable of generating highly detailed images with rich lighting, significantly better text rendering, and higher resolution output than previous models. Model details Property Description id_card Model code Gemini API imagen-4.0-generate-preview-06-06 imagen-4.0-ultra-generate-preview-06-06 save Supported data types Input Text Output Images token_auto Token limits [*] Input token limit 480 tokens (text) Output images 1 (Ultra) 1 to 4 (Standard) calendar_month Latest update June 2025 Imagen 3 Imagen 3 is our highest quality text-to-image model, capable of generating images with even better detail, richer lighting and fewer distracting artifacts than our previous models. Model details Property Description id_card Model code Gemini API imagen-3.0-generate-002 save Supported data types Input Text Output Images token_auto Token limits [*] Input token limit N/A Output images Up to 4 calendar_month Latest update February 2025 Veo 2 Veo 2 is our high quality text- and image-to-video model, capable of generating detailed videos, capturing the artistic nuance in your prompts. Model details Property Description id_card Model code Gemini API veo-2.0-generate-001 save Supported data types Input Text, image Output Video token_auto Limits Text input N/A Image input Any image resolution and aspect ratio up to 20MB file size Output video Up to 2 calendar_month Latest update April 2025 Gemini 2.5 Flash Live The Gemini 2.5 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-live-2.5-flash-preview save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-live-2.5-flash-preview calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.0 Flash Live The Gemini 2.0 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-live-001 save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-live-001 calendar_month Latest update April 2025 cognition_2 Knowledge cutoff August 2024 Gemini Embedding Experimental Gemini embedding achieves a SOTA performance across many key dimensions including code, multi-lingual, and retrieval. Gemini Embedding rate limits are more restricted since it is an experimental model. Model details Property Description id_card Model code Gemini API gemini-embedding-exp-03-07 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 8,192 Output dimension size Elastic, supports: 3072, 1536, or 768 calendar_month Latest update March 2025 Text Embedding and Embedding Text Embedding Try our new experimental Gemini embedding model which achieves state-of-the-art performance. Text embeddings are used to measure the relatedness of strings and are widely used in many AI applications. text-embedding-004 achieves a stronger retrieval performance and outperforms existing models with comparable dimensions, on the standard MTEB embedding benchmarks. Model details Property Description id_card Model code Gemini API models/text-embedding-004 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 2,048 Output dimension size 768 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Not supported calendar_month Latest update April 2024 Embedding Note: Text Embedding is the newer version of the Embedding model. If you're creating a new project, use Text Embedding. You can use the Embedding model to generate text embeddings for input text. The Embedding model is optimized for creating embeddings with 768 dimensions for text of up to 2,048 tokens. Embedding model details Property Description id_card Model code models/embedding-001 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 2,048 Output dimension size 768 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Not supported calendar_month Latest update December 2023 AQA You can use the AQA model to perform Attributed Question-Answering (AQA)–related tasks over a document, corpus, or a set of passages. The AQA model returns answers to questions that are grounded in provided sources, along with estimating answerable probability. Model details Property Description id_card Model code models/aqa save Supported data types Input Text Output Text language Supported language English token_auto Token limits [*] Input token limit 7,168 Output token limit 1,024 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Supported calendar_month Latest update December 2023 See the examples to explore the capabilities of these model variations. [*] A token is equivalent to about 4 characters for Gemini models. 100 tokens are about 60-80 English words. Model version name patterns Gemini models are available in either stable , preview , or experimental versions. In your code, you can use one of the following model name formats to specify which model and version you want to use. Latest stable Points to the most recent stable version released for the specified model generation and variation. To specify the latest stable version, use the following pattern: <model>-<generation>-<variation> . For example, gemini-2.0-flash . Stable Points to a specific stable model. Stable models usually don't change. Most production apps should use a specific stable model. To specify a stable version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-flash-001 . Preview Points to a preview model which may not be suitable for production use, come with more restrictive rate limits, but may have billing enabled. To specify a preview version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.5-pro-preview-06-05 . Experimental Points to an experimental model which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback and get our latest updates into the hands of developers quickly. To specify an experimental version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-pro-exp-02-05 . Experimental models In addition to stable models, the Gemini API offers experimental models which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback, get our latest updates into the hands of developers quickly, and highlight the pace of innovation happening at Google. What we learn from experimental launches informs how we release models more widely. An experimental model can be swapped for another without prior notice. We don't guarantee that an experimental model will become a stable model in the future. Previous experimental models As new versions or stable releases become available, we remove and replace experimental models. You can find the previous experimental models we released in the following section along with the replacement version: Model code Base model Replacement version gemini-2.5-flash-preview-04-17 Gemini 2.5 Flash gemini-2.5-flash-preview-05-20 gemini-2.0-flash-exp-image-generation Gemini 2.0 Flash gemini-2.0-flash-preview-image-generation gemini-2.5-pro-preview-06-05 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-05-06 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-03-25 Gemini 2.5 Pro gemini-2.5-pro gemini-2.0-flash-thinking-exp-01-21 Gemini 2.5 Flash gemini-2.5-flash-preview-04-17 gemini-2.0-pro-exp-02-05 Gemini 2.0 Pro Experimental gemini-2.5-pro-preview-03-25 gemini-2.0-flash-exp Gemini 2.0 Flash gemini-2.0-flash gemini-exp-1206 Gemini 2.0 Pro gemini-2.0-pro-exp-02-05 gemini-2.0-flash-thinking-exp-1219 Gemini 2.0 Flash Thinking gemini-2.0-flash-thinking-exp-01-21 gemini-exp-1121 Gemini gemini-exp-1206 gemini-exp-1114 Gemini gemini-exp-1206 gemini-1.5-pro-exp-0827 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-pro-exp-0801 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-flash-8b-exp-0924 Gemini 1.5 Flash-8B gemini-1.5-flash-8b gemini-1.5-flash-8b-exp-0827 Gemini 1.5 Flash-8B gemini-1.5-flash-8b Supported languages Gemini models are trained to work with the following languages: Arabic ( ar ) Bengali ( bn ) Bulgarian ( bg ) Chinese simplified and traditional ( zh ) Croatian ( hr ) Czech ( cs ) Danish ( da ) Dutch ( nl ) English ( en ) Estonian ( et ) Finnish ( fi ) French ( fr ) German ( de ) Greek ( el ) Hebrew ( iw ) Hindi ( hi ) Hungarian ( hu ) Indonesian ( id ) Italian ( it ) Japanese ( ja ) Korean ( ko ) Latvian ( lv ) Lithuanian ( lt ) Norwegian ( no ) Polish ( pl ) Portuguese ( pt ) Romanian ( ro ) Russian ( ru ) Serbian ( sr ) Slovak ( sk ) Slovenian ( sl ) Spanish ( es ) Swahili ( sw ) Swedish ( sv ) Thai ( th ) Turkish ( tr ) Ukrainian ( uk ) Vietnamese ( vi ) Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-07 UTC.
 
2
  Title: Gemini models  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Gemini models | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Gemini models 2.5 Pro spark Our most powerful thinking model with maximum response accuracy and state-of-the-art performance Input audio, images, video, and text, get text responses Tackle difficult problems, analyze large databases, and more Best for complex coding, reasoning, and multimodal understanding 2.5 Flash spark Our best model in terms of price-performance, offering well-rounded capabilities. Input audio, images, video, and text, and get text responses Model thinks as needed; or, you can configure a thinking budget Best for low latency, high volume tasks that require thinking 2.5 Flash-Lite spark A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Input audio, images, video, and text, and get text responses Most cost-efficient model supporting high throughput Best for real time, low latency use cases Note: Gemini 2.5 Pro and 2.5 Flash come with thinking on by default . If you're migrating from a non-thinking model such as 2.0 Pro or Flash, we recommend you to review the Thinking guide first. Model variants The Gemini API offers different models that are optimized for specific use cases. Here's a brief overview of Gemini variants that are available: Model variant Input(s) Output Optimized for Gemini 2.5 Pro gemini-2.5-pro Audio, images, videos, text, and PDF Text Enhanced thinking and reasoning, multimodal understanding, advanced coding, and more Gemini 2.5 Flash gemini-2.5-flash Audio, images, videos, and text Text Adaptive thinking, cost efficiency Gemini 2.5 Flash-Lite gemini-2.5-flash-lite Text, image, video, audio Text Most cost-efficient model supporting high throughput Gemini 2.5 Flash Live gemini-live-2.5-flash-preview Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions Gemini 2.5 Flash Native Audio gemini-2.5-flash-preview-native-audio-dialog & gemini-2.5-flash-exp-native-audio-thinking-dialog Audio, videos, and text Text and audio, interleaved High quality, natural conversational audio outputs, with or without thinking Gemini 2.5 Flash Image Preview gemini-2.5-flash-image-preview Images and text Images and text Precise, conversational image generation and editing Gemini 2.5 Flash Preview TTS gemini-2.5-flash-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.5 Pro Preview TTS gemini-2.5-pro-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.0 Flash gemini-2.0-flash Audio, images, videos, and text Text Next generation features, speed, and realtime streaming. Gemini 2.0 Flash Preview Image Generation gemini-2.0-flash-preview-image-generation Audio, images, videos, and text Text, images Conversational image generation and editing Gemini 2.0 Flash-Lite gemini-2.0-flash-lite Audio, images, videos, and text Text Cost efficiency and low latency Gemini 2.0 Flash Live gemini-2.0-flash-live-001 Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions Gemini 1.5 Flash gemini-1.5-flash Audio, images, videos, and text Text Fast and versatile performance across a diverse variety of tasks Deprecated Gemini 1.5 Flash-8B gemini-1.5-flash-8b Audio, images, videos, and text Text High volume and lower intelligence tasks Deprecated Gemini 1.5 Pro gemini-1.5-pro Audio, images, videos, and text Text Complex reasoning tasks requiring more intelligence Deprecated You can view the rate limits for each model on the rate limits page . Gemini 2.5 Pro Gemini 2.5 Pro is our state-of-the-art thinking model, capable of reasoning over complex problems in code, math, and STEM, as well as analyzing large datasets, codebases, and documents using long context. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-pro save Supported data types Inputs Audio, images, video, text, and PDF Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Structured outputs Supported Caching Supported Function calling Supported Code execution Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported Batch API Supported URL Context Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-pro calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Our best model in terms of price-performance, offering well-rounded capabilities. 2.5 Flash is best for large scale processing, low-latency, high volume tasks that require thinking, and agentic use cases. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-flash save Supported data types Inputs Text, images, video, audio Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Audio generation Not supported Caching Supported Code execution Supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Supported Thinking Supported Batch API Supported URL Context Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-flash Preview: gemini-2.5-flash-preview-05-20 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash-Lite A Gemini 2.5 Flash model optimized for cost-efficiency and high throughput. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-flash-lite save Supported data types Inputs Text, image, video, audio, PDF Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Structured outputs Supported Caching Supported Function calling Supported Code execution Supported URL Context Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported Batch API Supported URL Context Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-flash-lite Preview: gemini-2.5-flash-lite-06-17 calendar_month Latest update July 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Live The Gemini 2.5 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code gemini-live-2.5-flash-preview save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported Batch API Not supported URL context Supported 123 Versions Read the model version patterns for more details. Preview: gemini-live-2.5-flash-preview calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Native Audio Our native audio dialog models, with and without thinking, available through the Live API . These models provide interactive and unstructured conversational experiences, with style and control prompting. Try native audio in Google AI Studio Model details Property Description id_card Model code gemini-2.5-flash-preview-native-audio-dialog & gemini-2.5-flash-exp-native-audio-thinking-dialog save Supported data types Inputs Audio, video, text Output Audio and text token_auto Token limits [*] Input token limit 128,000 Output token limit 8,000 handyman Capabilities Audio generation Supported Caching Not supported Code execution Not supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Not supported Thinking Supported Batch API Not supported Tuning Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-preview-05-20 Experimental: gemini-2.5-flash-exp-native-audio-thinking-dialog calendar_month Latest update May 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Image Preview Gemini 2.5 Flash Image Preview is our latest, fastest, and most efficient natively multimodal model that lets you generate and edit images conversationally. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-flash-image-preview save Supported data types Inputs Images and text Output Images and text token_auto Token limits [*] Input token limit 32,768 Output token limit 32,768 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Not supported Code execution Not Supported Search Not Supported Image generation Supported Audio generation Not supported Live API Not Supported Thinking Not Supported Batch API Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-image-preview calendar_month Latest update August 2025 cognition_2 Knowledge cutoff June 2025 Gemini 2.5 Flash Preview Text-to-Speech Gemini 2.5 Flash Preview TTS is our price-performant text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Flash rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-flash-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported Batch API Supported 123 Versions Read the model version patterns for more details. gemini-2.5-flash-preview-tts calendar_month Latest update May 2025 Gemini 2.5 Pro Preview Text-to-Speech Gemini 2.5 Pro Preview TTS is our most powerful text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Pro rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-pro-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported Batch API Supported 123 Versions Read the model version patterns for more details. gemini-2.5-pro-preview-tts calendar_month Latest update May 2025 Gemini 2.0 Flash Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, native tool use, and a 1M token context window. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.0-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Not supported Live API Supported Thinking Experimental Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash Stable: gemini-2.0-flash-001 Experimental: gemini-2.0-flash-exp calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash Preview Image Generation Gemini 2.0 Flash Preview Image Generation delivers improved image generation features, including generating and editing images conversationally. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.0-flash-preview-image-generation save Supported data types Inputs Audio, images, video, and text Output Text and images token_auto Token limits [*] Input token limit 32,000 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Not supported Code execution Not Supported Search Not Supported Image generation Supported Audio generation Not supported Live API Not Supported Thinking Not Supported Batch API Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-preview-image-generation gemini-2.0-flash-preview-image-generation is not currently supported in a number of countries in Europe, Middle East & Africa calendar_month Latest update May 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash-Lite A Gemini 2.0 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.0-flash-lite save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Not supported Search Not supported Image generation Not supported Audio generation Not supported Live API Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash-lite Stable: gemini-2.0-flash-lite-001 calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash Live The Gemini 2.0 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.0-flash-live-001 save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported Batch API Not supported URL context Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-live-001 calendar_month Latest update April 2025 cognition_2 Knowledge cutoff August 2024 Gemini 1.5 Flash Gemini 1.5 Flash is a fast and versatile multimodal model for scaling across diverse tasks. Model details Property Description id_card Model code gemini-1.5-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-latest Latest stable: gemini-1.5-flash Stable: gemini-1.5-flash-001 gemini-1.5-flash-002 calendar_month Deprecation date September 2025 calendar_month Latest update September 2024 Gemini 1.5 Flash-8B Gemini 1.5 Flash-8B is a small model designed for lower intelligence tasks. Model details Property Description id_card Model code gemini-1.5-flash-8b save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-8b-latest Latest stable: gemini-1.5-flash-8b Stable: gemini-1.5-flash-8b-001 calendar_month Deprecation date September 2025 calendar_month Latest update October 2024 Gemini 1.5 Pro Try Gemini 2.5 Pro Preview , our most advanced Gemini model to date. Gemini 1.5 Pro is a mid-size multimodal model that is optimized for a wide-range of reasoning tasks. 1.5 Pro can process large amounts of data at once, including 2 hours of video, 19 hours of audio, codebases with 60,000 lines of code, or 2,000 pages of text. Model details Property Description id_card Model code gemini-1.5-pro save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 2,097,152 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 7,200 Maximum video length 2 hours Maximum audio length Approximately 19 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-pro-latest Latest stable: gemini-1.5-pro Stable: gemini-1.5-pro-001 gemini-1.5-pro-002 calendar_month Deprecation date September 2025 calendar_month Latest update September 2024 See the examples to explore the capabilities of these model variations. [*] A token is equivalent to about 4 characters for Gemini models. 100 tokens are about 60-80 English words. Model version name patterns Gemini models are available in either stable , preview , or experimental versions. In your code, you can use one of the following model name formats to specify which model and version you want to use. Latest stable Points to the most recent stable version released for the specified model generation and variation. To specify the latest stable version, use the following pattern: <model>-<generation>-<variation> . For example, gemini-2.0-flash . Stable Points to a specific stable model. Stable models usually don't change. Most production apps should use a specific stable model. To specify a stable version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-flash-001 . Preview Points to a preview model which may not be suitable for production use, come with more restrictive rate limits, but may have billing enabled. To specify a preview version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.5-pro-preview-06-05 . Preview models are not stable and availability of model endpoints is subject to change. Experimental Points to an experimental model which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback and get our latest updates into the hands of developers quickly. To specify an experimental version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-pro-exp-02-05 . Experimental models are not stable and availability of model endpoints is subject to change. Experimental models In addition to stable models, the Gemini API offers experimental models which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback, get our latest updates into the hands of developers quickly, and highlight the pace of innovation happening at Google. What we learn from experimental launches informs how we release models more widely. An experimental model can be swapped for another without prior notice. We don't guarantee that an experimental model will become a stable model in the future. Previous experimental models As new versions or stable releases become available, we remove and replace experimental models. You can find the previous experimental models we released in the following section along with the replacement version: Model code Base model Replacement version gemini-embedding-exp-03-07 Gemini Embedding gemini-embedding-001 gemini-2.5-flash-preview-04-17 Gemini 2.5 Flash gemini-2.5-flash-preview-05-20 gemini-2.0-flash-exp-image-generation Gemini 2.0 Flash gemini-2.0-flash-preview-image-generation gemini-2.5-pro-preview-06-05 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-05-06 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-03-25 Gemini 2.5 Pro gemini-2.5-pro gemini-2.0-flash-thinking-exp-01-21 Gemini 2.5 Flash gemini-2.5-flash-preview-04-17 gemini-2.0-pro-exp-02-05 Gemini 2.0 Pro Experimental gemini-2.5-pro-preview-03-25 gemini-2.0-flash-exp Gemini 2.0 Flash gemini-2.0-flash gemini-exp-1206 Gemini 2.0 Pro gemini-2.0-pro-exp-02-05 gemini-2.0-flash-thinking-exp-1219 Gemini 2.0 Flash Thinking gemini-2.0-flash-thinking-exp-01-21 gemini-exp-1121 Gemini gemini-exp-1206 gemini-exp-1114 Gemini gemini-exp-1206 gemini-1.5-pro-exp-0827 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-pro-exp-0801 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-flash-8b-exp-0924 Gemini 1.5 Flash-8B gemini-1.5-flash-8b gemini-1.5-flash-8b-exp-0827 Gemini 1.5 Flash-8B gemini-1.5-flash-8b Supported languages Gemini models are trained to work with the following languages: Arabic ( ar ) Bengali ( bn ) Bulgarian ( bg ) Chinese simplified and traditional ( zh ) Croatian ( hr ) Czech ( cs ) Danish ( da ) Dutch ( nl ) English ( en ) Estonian ( et ) Finnish ( fi ) French ( fr ) German ( de ) Greek ( el ) Hebrew ( iw ) Hindi ( hi ) Hungarian ( hu ) Indonesian ( id ) Italian ( it ) Japanese ( ja ) Korean ( ko ) Latvian ( lv ) Lithuanian ( lt ) Norwegian ( no ) Polish ( pl ) Portuguese ( pt ) Romanian ( ro ) Russian ( ru ) Serbian ( sr ) Slovak ( sk ) Slovenian ( sl ) Spanish ( es ) Swahili ( sw ) Swedish ( sv ) Thai ( th ) Turkish ( tr ) Ukrainian ( uk ) Vietnamese ( vi ) Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-09-14 UTC.
text_content/docs_models_302dc48c.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/models#gemini-2.5-flash-native-audio
2
  Title: Gemini models  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Gemini models | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Gemini models 2.5 Pro spark Our most powerful thinking model with maximum response accuracy and state-of-the-art performance Input audio, images, video, and text, get text responses Tackle difficult problems, analyze large databases, and more Best for complex coding, reasoning, and multimodal understanding 2.5 Flash spark Our best model in terms of price-performance, offering well-rounded capabilities. Input audio, images, video, and text, and get text responses Model thinks as needed; or, you can configure a thinking budget Best for low latency, high volume tasks that require thinking 2.5 Flash-Lite experiment A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Input audio, images, video, and text, and get text responses Most cost-efficient model supporting high throughput Best for real time, low latency use cases Note: Gemini 2.5 Pro and 2.5 Flash come with thinking on by default . If you're migrating from a non-thinking model such as 2.0 Pro or Flash, we recommend you to review the Thinking guide first. Model variants The Gemini API offers different models that are optimized for specific use cases. Here's a brief overview of Gemini variants that are available: Model variant Input(s) Output Optimized for Gemini 2.5 Pro gemini-2.5-pro Audio, images, videos, text, and PDF Text Enhanced thinking and reasoning, multimodal understanding, advanced coding, and more Gemini 2.5 Flash gemini-2.5-flash Audio, images, videos, and text Text Adaptive thinking, cost efficiency Gemini 2.5 Flash-Lite Preview gemini-2.5-flash-lite-preview-06-17 Text, image, video, audio Text Most cost-efficient model supporting high throughput Gemini 2.5 Flash Native Audio gemini-2.5-flash-preview-native-audio-dialog & gemini-2.5-flash-exp-native-audio-thinking-dialog Audio, videos, and text Text and audio, interleaved High quality, natural conversational audio outputs, with or without thinking Gemini 2.5 Flash Preview TTS gemini-2.5-flash-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.5 Pro Preview TTS gemini-2.5-pro-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.0 Flash gemini-2.0-flash Audio, images, videos, and text Text Next generation features, speed, and realtime streaming. Gemini 2.0 Flash Preview Image Generation gemini-2.0-flash-preview-image-generation Audio, images, videos, and text Text, images Conversational image generation and editing Gemini 2.0 Flash-Lite gemini-2.0-flash-lite Audio, images, videos, and text Text Cost efficiency and low latency Gemini 1.5 Flash gemini-1.5-flash Audio, images, videos, and text Text Fast and versatile performance across a diverse variety of tasks Gemini 1.5 Flash-8B gemini-1.5-flash-8b Audio, images, videos, and text Text High volume and lower intelligence tasks Gemini 1.5 Pro gemini-1.5-pro Audio, images, videos, and text Text Complex reasoning tasks requiring more intelligence Gemini Embedding gemini-embedding-exp Text Text embeddings Measuring the relatedness of text strings Imagen 4 imagen-4.0-generate-preview-06-06 imagen-4.0-ultra-generate-preview-06-06 Text Images Our most up-to-date image generation model Imagen 3 imagen-3.0-generate-002 Text Images High quality image generation model Veo 2 veo-2.0-generate-001 Text, images Video High quality video generation Gemini 2.5 Flash Live gemini-live-2.5-flash-preview Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions Gemini 2.0 Flash Live gemini-2.0-flash-live-001 Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions You can view the rate limits for each model on the rate limits page . Gemini 2.5 Pro Gemini 2.5 Pro is our state-of-the-art thinking model, capable of reasoning over complex problems in code, math, and STEM, as well as analyzing large datasets, codebases, and documents using long context. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-pro save Supported data types Inputs Audio, images, video, text, and PDF Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported Batch API Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-pro Preview: gemini-2.5-pro-preview-06-05 Preview: gemini-2.5-pro-preview-05-06 Preview: gemini-2.5-pro-preview-03-25 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Our best model in terms of price-performance, offering well-rounded capabilities. 2.5 Flash is best for large scale processing, low-latency, high volume tasks that require thinking, and agentic use cases. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash save Supported data types Inputs Text, images, video, audio Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Audio generation Not supported Caching Supported Code execution Supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Supported Thinking Supported Tuning Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-flash Preview: gemini-2.5-flash-preview-05-20 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash-Lite Preview A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-lite-preview-06-17 save Supported data types Inputs Text, images, video, and audio Output Text token_auto Token limits [*] Input token limit 1,000,000 Output token limit 64,000 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported URL Context Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-lite-preview-06-17 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Native Audio Our native audio dialog models, with and without thinking, available through the Live API . These models provide interactive and unstructured conversational experiences, with style and control prompting. Try native audio in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-preview-native-audio-dialog & models/gemini-2.5-flash-exp-native-audio-thinking-dialog save Supported data types Inputs Audio, video, text Output Audio and text token_auto Token limits [*] Input token limit 128,000 Output token limit 8,000 handyman Capabilities Audio generation Supported Caching Not supported Code execution Not supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Not supported Thinking Supported Tuning Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-preview-05-20 Experimental: gemini-2.5-flash-exp-native-audio-thinking-dialog calendar_month Latest update May 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Preview Text-to-Speech Gemini 2.5 Flash Preview TTS is our price-performant text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Flash rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported 123 Versions Read the model version patterns for more details. gemini-2.5-flash-preview-tts calendar_month Latest update May 2025 Gemini 2.5 Pro Preview Text-to-Speech Gemini 2.5 Pro Preview TTS is our most powerful text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Pro rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-pro-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported 123 Versions Read the model version patterns for more details. gemini-2.5-pro-preview-tts calendar_month Latest update May 2025 Gemini 2.0 Flash Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, native tool use, and a 1M token context window. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Not supported Live API Supported Thinking Experimental Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash Stable: gemini-2.0-flash-001 Experimental: gemini-2.0-flash-exp calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash Preview Image Generation Gemini 2.0 Flash Preview Image Generation delivers improved image generation features, including generating and editing images conversationally. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-preview-image-generation save Supported data types Inputs Audio, images, video, and text Output Text and images token_auto Token limits [*] Input token limit 32,000 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Not supported Code execution Not Supported Search Not Supported Image generation Supported Audio generation Not supported Live API Not Supported Thinking Not Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-preview-image-generation gemini-2.0-flash-preview-image-generation is not currently supported in a number of countries in Europe, Middle East & Africa calendar_month Latest update May 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash-Lite A Gemini 2.0 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-lite save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Not supported Search Not supported Image generation Not supported Audio generation Not supported Live API Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash-lite Stable: gemini-2.0-flash-lite-001 calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 1.5 Flash Gemini 1.5 Flash is a fast and versatile multimodal model for scaling across diverse tasks. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-latest Latest stable: gemini-1.5-flash Stable: gemini-1.5-flash-001 gemini-1.5-flash-002 calendar_month Latest update September 2024 Gemini 1.5 Flash-8B Gemini 1.5 Flash-8B is a small model designed for lower intelligence tasks. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-flash-8b save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-8b-latest Latest stable: gemini-1.5-flash-8b Stable: gemini-1.5-flash-8b-001 calendar_month Latest update October 2024 Gemini 1.5 Pro Try Gemini 2.5 Pro Preview , our most advanced Gemini model to date. Gemini 1.5 Pro is a mid-size multimodal model that is optimized for a wide-range of reasoning tasks. 1.5 Pro can process large amounts of data at once, including 2 hours of video, 19 hours of audio, codebases with 60,000 lines of code, or 2,000 pages of text. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-pro save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 2,097,152 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 7,200 Maximum video length 2 hours Maximum audio length Approximately 19 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-pro-latest Latest stable: gemini-1.5-pro Stable: gemini-1.5-pro-001 gemini-1.5-pro-002 calendar_month Latest update September 2024 Imagen 4 Imagen 4 is our latest image model, capable of generating highly detailed images with rich lighting, significantly better text rendering, and higher resolution output than previous models. Model details Property Description id_card Model code Gemini API imagen-4.0-generate-preview-06-06 imagen-4.0-ultra-generate-preview-06-06 save Supported data types Input Text Output Images token_auto Token limits [*] Input token limit 480 tokens (text) Output images 1 (Ultra) 1 to 4 (Standard) calendar_month Latest update June 2025 Imagen 3 Imagen 3 is our highest quality text-to-image model, capable of generating images with even better detail, richer lighting and fewer distracting artifacts than our previous models. Model details Property Description id_card Model code Gemini API imagen-3.0-generate-002 save Supported data types Input Text Output Images token_auto Token limits [*] Input token limit N/A Output images Up to 4 calendar_month Latest update February 2025 Veo 2 Veo 2 is our high quality text- and image-to-video model, capable of generating detailed videos, capturing the artistic nuance in your prompts. Model details Property Description id_card Model code Gemini API veo-2.0-generate-001 save Supported data types Input Text, image Output Video token_auto Limits Text input N/A Image input Any image resolution and aspect ratio up to 20MB file size Output video Up to 2 calendar_month Latest update April 2025 Gemini 2.5 Flash Live The Gemini 2.5 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-live-2.5-flash-preview save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-live-2.5-flash-preview calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.0 Flash Live The Gemini 2.0 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-live-001 save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-live-001 calendar_month Latest update April 2025 cognition_2 Knowledge cutoff August 2024 Gemini Embedding Experimental Gemini embedding achieves a SOTA performance across many key dimensions including code, multi-lingual, and retrieval. Gemini Embedding rate limits are more restricted since it is an experimental model. Model details Property Description id_card Model code Gemini API gemini-embedding-exp-03-07 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 8,192 Output dimension size Elastic, supports: 3072, 1536, or 768 calendar_month Latest update March 2025 Text Embedding and Embedding Text Embedding Try our new experimental Gemini embedding model which achieves state-of-the-art performance. Text embeddings are used to measure the relatedness of strings and are widely used in many AI applications. text-embedding-004 achieves a stronger retrieval performance and outperforms existing models with comparable dimensions, on the standard MTEB embedding benchmarks. Model details Property Description id_card Model code Gemini API models/text-embedding-004 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 2,048 Output dimension size 768 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Not supported calendar_month Latest update April 2024 Embedding Note: Text Embedding is the newer version of the Embedding model. If you're creating a new project, use Text Embedding. You can use the Embedding model to generate text embeddings for input text. The Embedding model is optimized for creating embeddings with 768 dimensions for text of up to 2,048 tokens. Embedding model details Property Description id_card Model code models/embedding-001 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 2,048 Output dimension size 768 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Not supported calendar_month Latest update December 2023 AQA You can use the AQA model to perform Attributed Question-Answering (AQA)–related tasks over a document, corpus, or a set of passages. The AQA model returns answers to questions that are grounded in provided sources, along with estimating answerable probability. Model details Property Description id_card Model code models/aqa save Supported data types Input Text Output Text language Supported language English token_auto Token limits [*] Input token limit 7,168 Output token limit 1,024 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Supported calendar_month Latest update December 2023 See the examples to explore the capabilities of these model variations. [*] A token is equivalent to about 4 characters for Gemini models. 100 tokens are about 60-80 English words. Model version name patterns Gemini models are available in either stable , preview , or experimental versions. In your code, you can use one of the following model name formats to specify which model and version you want to use. Latest stable Points to the most recent stable version released for the specified model generation and variation. To specify the latest stable version, use the following pattern: <model>-<generation>-<variation> . For example, gemini-2.0-flash . Stable Points to a specific stable model. Stable models usually don't change. Most production apps should use a specific stable model. To specify a stable version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-flash-001 . Preview Points to a preview model which may not be suitable for production use, come with more restrictive rate limits, but may have billing enabled. To specify a preview version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.5-pro-preview-06-05 . Experimental Points to an experimental model which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback and get our latest updates into the hands of developers quickly. To specify an experimental version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-pro-exp-02-05 . Experimental models In addition to stable models, the Gemini API offers experimental models which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback, get our latest updates into the hands of developers quickly, and highlight the pace of innovation happening at Google. What we learn from experimental launches informs how we release models more widely. An experimental model can be swapped for another without prior notice. We don't guarantee that an experimental model will become a stable model in the future. Previous experimental models As new versions or stable releases become available, we remove and replace experimental models. You can find the previous experimental models we released in the following section along with the replacement version: Model code Base model Replacement version gemini-2.5-flash-preview-04-17 Gemini 2.5 Flash gemini-2.5-flash-preview-05-20 gemini-2.0-flash-exp-image-generation Gemini 2.0 Flash gemini-2.0-flash-preview-image-generation gemini-2.5-pro-preview-06-05 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-05-06 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-03-25 Gemini 2.5 Pro gemini-2.5-pro gemini-2.0-flash-thinking-exp-01-21 Gemini 2.5 Flash gemini-2.5-flash-preview-04-17 gemini-2.0-pro-exp-02-05 Gemini 2.0 Pro Experimental gemini-2.5-pro-preview-03-25 gemini-2.0-flash-exp Gemini 2.0 Flash gemini-2.0-flash gemini-exp-1206 Gemini 2.0 Pro gemini-2.0-pro-exp-02-05 gemini-2.0-flash-thinking-exp-1219 Gemini 2.0 Flash Thinking gemini-2.0-flash-thinking-exp-01-21 gemini-exp-1121 Gemini gemini-exp-1206 gemini-exp-1114 Gemini gemini-exp-1206 gemini-1.5-pro-exp-0827 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-pro-exp-0801 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-flash-8b-exp-0924 Gemini 1.5 Flash-8B gemini-1.5-flash-8b gemini-1.5-flash-8b-exp-0827 Gemini 1.5 Flash-8B gemini-1.5-flash-8b Supported languages Gemini models are trained to work with the following languages: Arabic ( ar ) Bengali ( bn ) Bulgarian ( bg ) Chinese simplified and traditional ( zh ) Croatian ( hr ) Czech ( cs ) Danish ( da ) Dutch ( nl ) English ( en ) Estonian ( et ) Finnish ( fi ) French ( fr ) German ( de ) Greek ( el ) Hebrew ( iw ) Hindi ( hi ) Hungarian ( hu ) Indonesian ( id ) Italian ( it ) Japanese ( ja ) Korean ( ko ) Latvian ( lv ) Lithuanian ( lt ) Norwegian ( no ) Polish ( pl ) Portuguese ( pt ) Romanian ( ro ) Russian ( ru ) Serbian ( sr ) Slovak ( sk ) Slovenian ( sl ) Spanish ( es ) Swahili ( sw ) Swedish ( sv ) Thai ( th ) Turkish ( tr ) Ukrainian ( uk ) Vietnamese ( vi ) Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-07 UTC.
 
2
  Title: Gemini models  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Gemini models | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Gemini models 2.5 Pro spark Our most powerful thinking model with maximum response accuracy and state-of-the-art performance Input audio, images, video, and text, get text responses Tackle difficult problems, analyze large databases, and more Best for complex coding, reasoning, and multimodal understanding 2.5 Flash spark Our best model in terms of price-performance, offering well-rounded capabilities. Input audio, images, video, and text, and get text responses Model thinks as needed; or, you can configure a thinking budget Best for low latency, high volume tasks that require thinking 2.5 Flash-Lite spark A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Input audio, images, video, and text, and get text responses Most cost-efficient model supporting high throughput Best for real time, low latency use cases Note: Gemini 2.5 Pro and 2.5 Flash come with thinking on by default . If you're migrating from a non-thinking model such as 2.0 Pro or Flash, we recommend you to review the Thinking guide first. Model variants The Gemini API offers different models that are optimized for specific use cases. Here's a brief overview of Gemini variants that are available: Model variant Input(s) Output Optimized for Gemini 2.5 Pro gemini-2.5-pro Audio, images, videos, text, and PDF Text Enhanced thinking and reasoning, multimodal understanding, advanced coding, and more Gemini 2.5 Flash gemini-2.5-flash Audio, images, videos, and text Text Adaptive thinking, cost efficiency Gemini 2.5 Flash-Lite gemini-2.5-flash-lite Text, image, video, audio Text Most cost-efficient model supporting high throughput Gemini 2.5 Flash Live gemini-live-2.5-flash-preview Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions Gemini 2.5 Flash Native Audio gemini-2.5-flash-preview-native-audio-dialog & gemini-2.5-flash-exp-native-audio-thinking-dialog Audio, videos, and text Text and audio, interleaved High quality, natural conversational audio outputs, with or without thinking Gemini 2.5 Flash Image Preview gemini-2.5-flash-image-preview Images and text Images and text Precise, conversational image generation and editing Gemini 2.5 Flash Preview TTS gemini-2.5-flash-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.5 Pro Preview TTS gemini-2.5-pro-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.0 Flash gemini-2.0-flash Audio, images, videos, and text Text Next generation features, speed, and realtime streaming. Gemini 2.0 Flash Preview Image Generation gemini-2.0-flash-preview-image-generation Audio, images, videos, and text Text, images Conversational image generation and editing Gemini 2.0 Flash-Lite gemini-2.0-flash-lite Audio, images, videos, and text Text Cost efficiency and low latency Gemini 2.0 Flash Live gemini-2.0-flash-live-001 Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions Gemini 1.5 Flash gemini-1.5-flash Audio, images, videos, and text Text Fast and versatile performance across a diverse variety of tasks Deprecated Gemini 1.5 Flash-8B gemini-1.5-flash-8b Audio, images, videos, and text Text High volume and lower intelligence tasks Deprecated Gemini 1.5 Pro gemini-1.5-pro Audio, images, videos, and text Text Complex reasoning tasks requiring more intelligence Deprecated You can view the rate limits for each model on the rate limits page . Gemini 2.5 Pro Gemini 2.5 Pro is our state-of-the-art thinking model, capable of reasoning over complex problems in code, math, and STEM, as well as analyzing large datasets, codebases, and documents using long context. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-pro save Supported data types Inputs Audio, images, video, text, and PDF Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Structured outputs Supported Caching Supported Function calling Supported Code execution Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported Batch API Supported URL Context Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-pro calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Our best model in terms of price-performance, offering well-rounded capabilities. 2.5 Flash is best for large scale processing, low-latency, high volume tasks that require thinking, and agentic use cases. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-flash save Supported data types Inputs Text, images, video, audio Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Audio generation Not supported Caching Supported Code execution Supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Supported Thinking Supported Batch API Supported URL Context Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-flash Preview: gemini-2.5-flash-preview-05-20 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash-Lite A Gemini 2.5 Flash model optimized for cost-efficiency and high throughput. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-flash-lite save Supported data types Inputs Text, image, video, audio, PDF Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Structured outputs Supported Caching Supported Function calling Supported Code execution Supported URL Context Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported Batch API Supported URL Context Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-flash-lite Preview: gemini-2.5-flash-lite-06-17 calendar_month Latest update July 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Live The Gemini 2.5 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code gemini-live-2.5-flash-preview save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported Batch API Not supported URL context Supported 123 Versions Read the model version patterns for more details. Preview: gemini-live-2.5-flash-preview calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Native Audio Our native audio dialog models, with and without thinking, available through the Live API . These models provide interactive and unstructured conversational experiences, with style and control prompting. Try native audio in Google AI Studio Model details Property Description id_card Model code gemini-2.5-flash-preview-native-audio-dialog & gemini-2.5-flash-exp-native-audio-thinking-dialog save Supported data types Inputs Audio, video, text Output Audio and text token_auto Token limits [*] Input token limit 128,000 Output token limit 8,000 handyman Capabilities Audio generation Supported Caching Not supported Code execution Not supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Not supported Thinking Supported Batch API Not supported Tuning Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-preview-05-20 Experimental: gemini-2.5-flash-exp-native-audio-thinking-dialog calendar_month Latest update May 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Image Preview Gemini 2.5 Flash Image Preview is our latest, fastest, and most efficient natively multimodal model that lets you generate and edit images conversationally. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-flash-image-preview save Supported data types Inputs Images and text Output Images and text token_auto Token limits [*] Input token limit 32,768 Output token limit 32,768 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Not supported Code execution Not Supported Search Not Supported Image generation Supported Audio generation Not supported Live API Not Supported Thinking Not Supported Batch API Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-image-preview calendar_month Latest update August 2025 cognition_2 Knowledge cutoff June 2025 Gemini 2.5 Flash Preview Text-to-Speech Gemini 2.5 Flash Preview TTS is our price-performant text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Flash rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-flash-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported Batch API Supported 123 Versions Read the model version patterns for more details. gemini-2.5-flash-preview-tts calendar_month Latest update May 2025 Gemini 2.5 Pro Preview Text-to-Speech Gemini 2.5 Pro Preview TTS is our most powerful text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Pro rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-pro-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported Batch API Supported 123 Versions Read the model version patterns for more details. gemini-2.5-pro-preview-tts calendar_month Latest update May 2025 Gemini 2.0 Flash Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, native tool use, and a 1M token context window. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.0-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Not supported Live API Supported Thinking Experimental Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash Stable: gemini-2.0-flash-001 Experimental: gemini-2.0-flash-exp calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash Preview Image Generation Gemini 2.0 Flash Preview Image Generation delivers improved image generation features, including generating and editing images conversationally. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.0-flash-preview-image-generation save Supported data types Inputs Audio, images, video, and text Output Text and images token_auto Token limits [*] Input token limit 32,000 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Not supported Code execution Not Supported Search Not Supported Image generation Supported Audio generation Not supported Live API Not Supported Thinking Not Supported Batch API Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-preview-image-generation gemini-2.0-flash-preview-image-generation is not currently supported in a number of countries in Europe, Middle East & Africa calendar_month Latest update May 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash-Lite A Gemini 2.0 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.0-flash-lite save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Not supported Search Not supported Image generation Not supported Audio generation Not supported Live API Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash-lite Stable: gemini-2.0-flash-lite-001 calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash Live The Gemini 2.0 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.0-flash-live-001 save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported Batch API Not supported URL context Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-live-001 calendar_month Latest update April 2025 cognition_2 Knowledge cutoff August 2024 Gemini 1.5 Flash Gemini 1.5 Flash is a fast and versatile multimodal model for scaling across diverse tasks. Model details Property Description id_card Model code gemini-1.5-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-latest Latest stable: gemini-1.5-flash Stable: gemini-1.5-flash-001 gemini-1.5-flash-002 calendar_month Deprecation date September 2025 calendar_month Latest update September 2024 Gemini 1.5 Flash-8B Gemini 1.5 Flash-8B is a small model designed for lower intelligence tasks. Model details Property Description id_card Model code gemini-1.5-flash-8b save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-8b-latest Latest stable: gemini-1.5-flash-8b Stable: gemini-1.5-flash-8b-001 calendar_month Deprecation date September 2025 calendar_month Latest update October 2024 Gemini 1.5 Pro Try Gemini 2.5 Pro Preview , our most advanced Gemini model to date. Gemini 1.5 Pro is a mid-size multimodal model that is optimized for a wide-range of reasoning tasks. 1.5 Pro can process large amounts of data at once, including 2 hours of video, 19 hours of audio, codebases with 60,000 lines of code, or 2,000 pages of text. Model details Property Description id_card Model code gemini-1.5-pro save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 2,097,152 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 7,200 Maximum video length 2 hours Maximum audio length Approximately 19 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-pro-latest Latest stable: gemini-1.5-pro Stable: gemini-1.5-pro-001 gemini-1.5-pro-002 calendar_month Deprecation date September 2025 calendar_month Latest update September 2024 See the examples to explore the capabilities of these model variations. [*] A token is equivalent to about 4 characters for Gemini models. 100 tokens are about 60-80 English words. Model version name patterns Gemini models are available in either stable , preview , or experimental versions. In your code, you can use one of the following model name formats to specify which model and version you want to use. Latest stable Points to the most recent stable version released for the specified model generation and variation. To specify the latest stable version, use the following pattern: <model>-<generation>-<variation> . For example, gemini-2.0-flash . Stable Points to a specific stable model. Stable models usually don't change. Most production apps should use a specific stable model. To specify a stable version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-flash-001 . Preview Points to a preview model which may not be suitable for production use, come with more restrictive rate limits, but may have billing enabled. To specify a preview version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.5-pro-preview-06-05 . Preview models are not stable and availability of model endpoints is subject to change. Experimental Points to an experimental model which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback and get our latest updates into the hands of developers quickly. To specify an experimental version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-pro-exp-02-05 . Experimental models are not stable and availability of model endpoints is subject to change. Experimental models In addition to stable models, the Gemini API offers experimental models which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback, get our latest updates into the hands of developers quickly, and highlight the pace of innovation happening at Google. What we learn from experimental launches informs how we release models more widely. An experimental model can be swapped for another without prior notice. We don't guarantee that an experimental model will become a stable model in the future. Previous experimental models As new versions or stable releases become available, we remove and replace experimental models. You can find the previous experimental models we released in the following section along with the replacement version: Model code Base model Replacement version gemini-embedding-exp-03-07 Gemini Embedding gemini-embedding-001 gemini-2.5-flash-preview-04-17 Gemini 2.5 Flash gemini-2.5-flash-preview-05-20 gemini-2.0-flash-exp-image-generation Gemini 2.0 Flash gemini-2.0-flash-preview-image-generation gemini-2.5-pro-preview-06-05 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-05-06 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-03-25 Gemini 2.5 Pro gemini-2.5-pro gemini-2.0-flash-thinking-exp-01-21 Gemini 2.5 Flash gemini-2.5-flash-preview-04-17 gemini-2.0-pro-exp-02-05 Gemini 2.0 Pro Experimental gemini-2.5-pro-preview-03-25 gemini-2.0-flash-exp Gemini 2.0 Flash gemini-2.0-flash gemini-exp-1206 Gemini 2.0 Pro gemini-2.0-pro-exp-02-05 gemini-2.0-flash-thinking-exp-1219 Gemini 2.0 Flash Thinking gemini-2.0-flash-thinking-exp-01-21 gemini-exp-1121 Gemini gemini-exp-1206 gemini-exp-1114 Gemini gemini-exp-1206 gemini-1.5-pro-exp-0827 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-pro-exp-0801 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-flash-8b-exp-0924 Gemini 1.5 Flash-8B gemini-1.5-flash-8b gemini-1.5-flash-8b-exp-0827 Gemini 1.5 Flash-8B gemini-1.5-flash-8b Supported languages Gemini models are trained to work with the following languages: Arabic ( ar ) Bengali ( bn ) Bulgarian ( bg ) Chinese simplified and traditional ( zh ) Croatian ( hr ) Czech ( cs ) Danish ( da ) Dutch ( nl ) English ( en ) Estonian ( et ) Finnish ( fi ) French ( fr ) German ( de ) Greek ( el ) Hebrew ( iw ) Hindi ( hi ) Hungarian ( hu ) Indonesian ( id ) Italian ( it ) Japanese ( ja ) Korean ( ko ) Latvian ( lv ) Lithuanian ( lt ) Norwegian ( no ) Polish ( pl ) Portuguese ( pt ) Romanian ( ro ) Russian ( ru ) Serbian ( sr ) Slovak ( sk ) Slovenian ( sl ) Spanish ( es ) Swahili ( sw ) Swedish ( sv ) Thai ( th ) Turkish ( tr ) Ukrainian ( uk ) Vietnamese ( vi ) Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-09-14 UTC.
text_content/docs_models_4b13d9ba.txt CHANGED
@@ -2,4 +2,4 @@ URL: https://ai.google.dev/gemini-api/docs/models#gemini-1.5-flash
2
  Title: Gemini models  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
- Gemini models | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Introducing Batch Mode, with higher rate limits and a 50% token discount. Learn more Home Gemini API Models Send feedback Gemini models 2.5 Pro spark Our most powerful thinking model with maximum response accuracy and state-of-the-art performance Input audio, images, video, and text, get text responses Tackle difficult problems, analyze large databases, and more Best for complex coding, reasoning, and multimodal understanding 2.5 Flash spark Our best model in terms of price-performance, offering well-rounded capabilities. Input audio, images, video, and text, and get text responses Model thinks as needed; or, you can configure a thinking budget Best for low latency, high volume tasks that require thinking 2.5 Flash-Lite experiment A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Input audio, images, video, and text, and get text responses Most cost-efficient model supporting high throughput Best for real time, low latency use cases Note: Gemini 2.5 Pro and 2.5 Flash come with thinking on by default . If you're migrating from a non-thinking model such as 2.0 Pro or Flash, we recommend you to review the Thinking guide first. Model variants The Gemini API offers different models that are optimized for specific use cases. Here's a brief overview of Gemini variants that are available: Model variant Input(s) Output Optimized for Gemini 2.5 Pro gemini-2.5-pro Audio, images, videos, text, and PDF Text Enhanced thinking and reasoning, multimodal understanding, advanced coding, and more Gemini 2.5 Flash gemini-2.5-flash Audio, images, videos, and text Text Adaptive thinking, cost efficiency Gemini 2.5 Flash-Lite Preview gemini-2.5-flash-lite-preview-06-17 Text, image, video, audio Text Most cost-efficient model supporting high throughput Gemini 2.5 Flash Native Audio gemini-2.5-flash-preview-native-audio-dialog & gemini-2.5-flash-exp-native-audio-thinking-dialog Audio, videos, and text Text and audio, interleaved High quality, natural conversational audio outputs, with or without thinking Gemini 2.5 Flash Preview TTS gemini-2.5-flash-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.5 Pro Preview TTS gemini-2.5-pro-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.0 Flash gemini-2.0-flash Audio, images, videos, and text Text Next generation features, speed, and realtime streaming. Gemini 2.0 Flash Preview Image Generation gemini-2.0-flash-preview-image-generation Audio, images, videos, and text Text, images Conversational image generation and editing Gemini 2.0 Flash-Lite gemini-2.0-flash-lite Audio, images, videos, and text Text Cost efficiency and low latency Gemini 1.5 Flash gemini-1.5-flash Audio, images, videos, and text Text Fast and versatile performance across a diverse variety of tasks Gemini 1.5 Flash-8B gemini-1.5-flash-8b Audio, images, videos, and text Text High volume and lower intelligence tasks Gemini 1.5 Pro gemini-1.5-pro Audio, images, videos, and text Text Complex reasoning tasks requiring more intelligence Gemini Embedding gemini-embedding-exp Text Text embeddings Measuring the relatedness of text strings Imagen 4 imagen-4.0-generate-preview-06-06 imagen-4.0-ultra-generate-preview-06-06 Text Images Our most up-to-date image generation model Imagen 3 imagen-3.0-generate-002 Text Images High quality image generation model Veo 2 veo-2.0-generate-001 Text, images Video High quality video generation Gemini 2.5 Flash Live gemini-live-2.5-flash-preview Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions Gemini 2.0 Flash Live gemini-2.0-flash-live-001 Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions You can view the rate limits for each model on the rate limits page . Gemini 2.5 Pro Gemini 2.5 Pro is our state-of-the-art thinking model, capable of reasoning over complex problems in code, math, and STEM, as well as analyzing large datasets, codebases, and documents using long context. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-pro save Supported data types Inputs Audio, images, video, text, and PDF Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported Batch API Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-pro Preview: gemini-2.5-pro-preview-06-05 Preview: gemini-2.5-pro-preview-05-06 Preview: gemini-2.5-pro-preview-03-25 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Our best model in terms of price-performance, offering well-rounded capabilities. 2.5 Flash is best for large scale processing, low-latency, high volume tasks that require thinking, and agentic use cases. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash save Supported data types Inputs Text, images, video, audio Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Audio generation Not supported Caching Supported Code execution Supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Supported Thinking Supported Tuning Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-flash Preview: gemini-2.5-flash-preview-05-20 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash-Lite Preview A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-lite-preview-06-17 save Supported data types Inputs Text, images, video, and audio Output Text token_auto Token limits [*] Input token limit 1,000,000 Output token limit 64,000 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported URL Context Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-lite-preview-06-17 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Native Audio Our native audio dialog models, with and without thinking, available through the Live API . These models provide interactive and unstructured conversational experiences, with style and control prompting. Try native audio in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-preview-native-audio-dialog & models/gemini-2.5-flash-exp-native-audio-thinking-dialog save Supported data types Inputs Audio, video, text Output Audio and text token_auto Token limits [*] Input token limit 128,000 Output token limit 8,000 handyman Capabilities Audio generation Supported Caching Not supported Code execution Not supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Not supported Thinking Supported Tuning Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-preview-05-20 Experimental: gemini-2.5-flash-exp-native-audio-thinking-dialog calendar_month Latest update May 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Preview Text-to-Speech Gemini 2.5 Flash Preview TTS is our price-performant text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Flash rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-flash-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported 123 Versions Read the model version patterns for more details. gemini-2.5-flash-preview-tts calendar_month Latest update May 2025 Gemini 2.5 Pro Preview Text-to-Speech Gemini 2.5 Pro Preview TTS is our most powerful text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Pro rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.5-pro-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported 123 Versions Read the model version patterns for more details. gemini-2.5-pro-preview-tts calendar_month Latest update May 2025 Gemini 2.0 Flash Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, native tool use, and a 1M token context window. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Not supported Live API Supported Thinking Experimental Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash Stable: gemini-2.0-flash-001 Experimental: gemini-2.0-flash-exp calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash Preview Image Generation Gemini 2.0 Flash Preview Image Generation delivers improved image generation features, including generating and editing images conversationally. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-preview-image-generation save Supported data types Inputs Audio, images, video, and text Output Text and images token_auto Token limits [*] Input token limit 32,000 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Not supported Code execution Not Supported Search Not Supported Image generation Supported Audio generation Not supported Live API Not Supported Thinking Not Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-preview-image-generation gemini-2.0-flash-preview-image-generation is not currently supported in a number of countries in Europe, Middle East & Africa calendar_month Latest update May 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash-Lite A Gemini 2.0 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-lite save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Not supported Search Not supported Image generation Not supported Audio generation Not supported Live API Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash-lite Stable: gemini-2.0-flash-lite-001 calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 1.5 Flash Gemini 1.5 Flash is a fast and versatile multimodal model for scaling across diverse tasks. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-latest Latest stable: gemini-1.5-flash Stable: gemini-1.5-flash-001 gemini-1.5-flash-002 calendar_month Latest update September 2024 Gemini 1.5 Flash-8B Gemini 1.5 Flash-8B is a small model designed for lower intelligence tasks. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-flash-8b save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-8b-latest Latest stable: gemini-1.5-flash-8b Stable: gemini-1.5-flash-8b-001 calendar_month Latest update October 2024 Gemini 1.5 Pro Try Gemini 2.5 Pro Preview , our most advanced Gemini model to date. Gemini 1.5 Pro is a mid-size multimodal model that is optimized for a wide-range of reasoning tasks. 1.5 Pro can process large amounts of data at once, including 2 hours of video, 19 hours of audio, codebases with 60,000 lines of code, or 2,000 pages of text. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-1.5-pro save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 2,097,152 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 7,200 Maximum video length 2 hours Maximum audio length Approximately 19 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-pro-latest Latest stable: gemini-1.5-pro Stable: gemini-1.5-pro-001 gemini-1.5-pro-002 calendar_month Latest update September 2024 Imagen 4 Imagen 4 is our latest image model, capable of generating highly detailed images with rich lighting, significantly better text rendering, and higher resolution output than previous models. Model details Property Description id_card Model code Gemini API imagen-4.0-generate-preview-06-06 imagen-4.0-ultra-generate-preview-06-06 save Supported data types Input Text Output Images token_auto Token limits [*] Input token limit 480 tokens (text) Output images 1 (Ultra) 1 to 4 (Standard) calendar_month Latest update June 2025 Imagen 3 Imagen 3 is our highest quality text-to-image model, capable of generating images with even better detail, richer lighting and fewer distracting artifacts than our previous models. Model details Property Description id_card Model code Gemini API imagen-3.0-generate-002 save Supported data types Input Text Output Images token_auto Token limits [*] Input token limit N/A Output images Up to 4 calendar_month Latest update February 2025 Veo 2 Veo 2 is our high quality text- and image-to-video model, capable of generating detailed videos, capturing the artistic nuance in your prompts. Model details Property Description id_card Model code Gemini API veo-2.0-generate-001 save Supported data types Input Text, image Output Video token_auto Limits Text input N/A Image input Any image resolution and aspect ratio up to 20MB file size Output video Up to 2 calendar_month Latest update April 2025 Gemini 2.5 Flash Live The Gemini 2.5 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-live-2.5-flash-preview save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-live-2.5-flash-preview calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.0 Flash Live The Gemini 2.0 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code models/gemini-2.0-flash-live-001 save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-live-001 calendar_month Latest update April 2025 cognition_2 Knowledge cutoff August 2024 Gemini Embedding Experimental Gemini embedding achieves a SOTA performance across many key dimensions including code, multi-lingual, and retrieval. Gemini Embedding rate limits are more restricted since it is an experimental model. Model details Property Description id_card Model code Gemini API gemini-embedding-exp-03-07 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 8,192 Output dimension size Elastic, supports: 3072, 1536, or 768 calendar_month Latest update March 2025 Text Embedding and Embedding Text Embedding Try our new experimental Gemini embedding model which achieves state-of-the-art performance. Text embeddings are used to measure the relatedness of strings and are widely used in many AI applications. text-embedding-004 achieves a stronger retrieval performance and outperforms existing models with comparable dimensions, on the standard MTEB embedding benchmarks. Model details Property Description id_card Model code Gemini API models/text-embedding-004 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 2,048 Output dimension size 768 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Not supported calendar_month Latest update April 2024 Embedding Note: Text Embedding is the newer version of the Embedding model. If you're creating a new project, use Text Embedding. You can use the Embedding model to generate text embeddings for input text. The Embedding model is optimized for creating embeddings with 768 dimensions for text of up to 2,048 tokens. Embedding model details Property Description id_card Model code models/embedding-001 save Supported data types Input Text Output Text embeddings token_auto Token limits [*] Input token limit 2,048 Output dimension size 768 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Not supported calendar_month Latest update December 2023 AQA You can use the AQA model to perform Attributed Question-Answering (AQA)–related tasks over a document, corpus, or a set of passages. The AQA model returns answers to questions that are grounded in provided sources, along with estimating answerable probability. Model details Property Description id_card Model code models/aqa save Supported data types Input Text Output Text language Supported language English token_auto Token limits [*] Input token limit 7,168 Output token limit 1,024 swap_driving_apps_wheel Rate limits [**] 1,500 requests per minute encrypted Adjustable safety settings Supported calendar_month Latest update December 2023 See the examples to explore the capabilities of these model variations. [*] A token is equivalent to about 4 characters for Gemini models. 100 tokens are about 60-80 English words. Model version name patterns Gemini models are available in either stable , preview , or experimental versions. In your code, you can use one of the following model name formats to specify which model and version you want to use. Latest stable Points to the most recent stable version released for the specified model generation and variation. To specify the latest stable version, use the following pattern: <model>-<generation>-<variation> . For example, gemini-2.0-flash . Stable Points to a specific stable model. Stable models usually don't change. Most production apps should use a specific stable model. To specify a stable version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-flash-001 . Preview Points to a preview model which may not be suitable for production use, come with more restrictive rate limits, but may have billing enabled. To specify a preview version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.5-pro-preview-06-05 . Experimental Points to an experimental model which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback and get our latest updates into the hands of developers quickly. To specify an experimental version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-pro-exp-02-05 . Experimental models In addition to stable models, the Gemini API offers experimental models which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback, get our latest updates into the hands of developers quickly, and highlight the pace of innovation happening at Google. What we learn from experimental launches informs how we release models more widely. An experimental model can be swapped for another without prior notice. We don't guarantee that an experimental model will become a stable model in the future. Previous experimental models As new versions or stable releases become available, we remove and replace experimental models. You can find the previous experimental models we released in the following section along with the replacement version: Model code Base model Replacement version gemini-2.5-flash-preview-04-17 Gemini 2.5 Flash gemini-2.5-flash-preview-05-20 gemini-2.0-flash-exp-image-generation Gemini 2.0 Flash gemini-2.0-flash-preview-image-generation gemini-2.5-pro-preview-06-05 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-05-06 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-03-25 Gemini 2.5 Pro gemini-2.5-pro gemini-2.0-flash-thinking-exp-01-21 Gemini 2.5 Flash gemini-2.5-flash-preview-04-17 gemini-2.0-pro-exp-02-05 Gemini 2.0 Pro Experimental gemini-2.5-pro-preview-03-25 gemini-2.0-flash-exp Gemini 2.0 Flash gemini-2.0-flash gemini-exp-1206 Gemini 2.0 Pro gemini-2.0-pro-exp-02-05 gemini-2.0-flash-thinking-exp-1219 Gemini 2.0 Flash Thinking gemini-2.0-flash-thinking-exp-01-21 gemini-exp-1121 Gemini gemini-exp-1206 gemini-exp-1114 Gemini gemini-exp-1206 gemini-1.5-pro-exp-0827 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-pro-exp-0801 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-flash-8b-exp-0924 Gemini 1.5 Flash-8B gemini-1.5-flash-8b gemini-1.5-flash-8b-exp-0827 Gemini 1.5 Flash-8B gemini-1.5-flash-8b Supported languages Gemini models are trained to work with the following languages: Arabic ( ar ) Bengali ( bn ) Bulgarian ( bg ) Chinese simplified and traditional ( zh ) Croatian ( hr ) Czech ( cs ) Danish ( da ) Dutch ( nl ) English ( en ) Estonian ( et ) Finnish ( fi ) French ( fr ) German ( de ) Greek ( el ) Hebrew ( iw ) Hindi ( hi ) Hungarian ( hu ) Indonesian ( id ) Italian ( it ) Japanese ( ja ) Korean ( ko ) Latvian ( lv ) Lithuanian ( lt ) Norwegian ( no ) Polish ( pl ) Portuguese ( pt ) Romanian ( ro ) Russian ( ru ) Serbian ( sr ) Slovak ( sk ) Slovenian ( sl ) Spanish ( es ) Swahili ( sw ) Swedish ( sv ) Thai ( th ) Turkish ( tr ) Ukrainian ( uk ) Vietnamese ( vi ) Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-07-07 UTC.
 
2
  Title: Gemini models  |  Gemini API  |  Google AI for Developers
3
  ==================================================
4
 
5
+ Gemini models | Gemini API | Google AI for Developers Skip to main content / English Deutsch Español – América Latina Français Indonesia Italiano Polski Português – Brasil Shqip Tiếng Việt Türkçe Русский עברית العربيّة فارسی हिंदी বাংলা ภาษาไทย 中文 – 简体 中文 – 繁體 日本語 한국어 Sign in Gemini 2.5 Flash Image (aka Nano Banana) is now available in the Gemini API! Learn more Home Gemini API Gemini API docs Send feedback Gemini models 2.5 Pro spark Our most powerful thinking model with maximum response accuracy and state-of-the-art performance Input audio, images, video, and text, get text responses Tackle difficult problems, analyze large databases, and more Best for complex coding, reasoning, and multimodal understanding 2.5 Flash spark Our best model in terms of price-performance, offering well-rounded capabilities. Input audio, images, video, and text, and get text responses Model thinks as needed; or, you can configure a thinking budget Best for low latency, high volume tasks that require thinking 2.5 Flash-Lite spark A Gemini 2.5 Flash model optimized for cost efficiency and low latency. Input audio, images, video, and text, and get text responses Most cost-efficient model supporting high throughput Best for real time, low latency use cases Note: Gemini 2.5 Pro and 2.5 Flash come with thinking on by default . If you're migrating from a non-thinking model such as 2.0 Pro or Flash, we recommend you to review the Thinking guide first. Model variants The Gemini API offers different models that are optimized for specific use cases. Here's a brief overview of Gemini variants that are available: Model variant Input(s) Output Optimized for Gemini 2.5 Pro gemini-2.5-pro Audio, images, videos, text, and PDF Text Enhanced thinking and reasoning, multimodal understanding, advanced coding, and more Gemini 2.5 Flash gemini-2.5-flash Audio, images, videos, and text Text Adaptive thinking, cost efficiency Gemini 2.5 Flash-Lite gemini-2.5-flash-lite Text, image, video, audio Text Most cost-efficient model supporting high throughput Gemini 2.5 Flash Live gemini-live-2.5-flash-preview Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions Gemini 2.5 Flash Native Audio gemini-2.5-flash-preview-native-audio-dialog & gemini-2.5-flash-exp-native-audio-thinking-dialog Audio, videos, and text Text and audio, interleaved High quality, natural conversational audio outputs, with or without thinking Gemini 2.5 Flash Image Preview gemini-2.5-flash-image-preview Images and text Images and text Precise, conversational image generation and editing Gemini 2.5 Flash Preview TTS gemini-2.5-flash-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.5 Pro Preview TTS gemini-2.5-pro-preview-tts Text Audio Low latency, controllable, single- and multi-speaker text-to-speech audio generation Gemini 2.0 Flash gemini-2.0-flash Audio, images, videos, and text Text Next generation features, speed, and realtime streaming. Gemini 2.0 Flash Preview Image Generation gemini-2.0-flash-preview-image-generation Audio, images, videos, and text Text, images Conversational image generation and editing Gemini 2.0 Flash-Lite gemini-2.0-flash-lite Audio, images, videos, and text Text Cost efficiency and low latency Gemini 2.0 Flash Live gemini-2.0-flash-live-001 Audio, video, and text Text, audio Low-latency bidirectional voice and video interactions Gemini 1.5 Flash gemini-1.5-flash Audio, images, videos, and text Text Fast and versatile performance across a diverse variety of tasks Deprecated Gemini 1.5 Flash-8B gemini-1.5-flash-8b Audio, images, videos, and text Text High volume and lower intelligence tasks Deprecated Gemini 1.5 Pro gemini-1.5-pro Audio, images, videos, and text Text Complex reasoning tasks requiring more intelligence Deprecated You can view the rate limits for each model on the rate limits page . Gemini 2.5 Pro Gemini 2.5 Pro is our state-of-the-art thinking model, capable of reasoning over complex problems in code, math, and STEM, as well as analyzing large datasets, codebases, and documents using long context. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-pro save Supported data types Inputs Audio, images, video, text, and PDF Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Structured outputs Supported Caching Supported Function calling Supported Code execution Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported Batch API Supported URL Context Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-pro calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Our best model in terms of price-performance, offering well-rounded capabilities. 2.5 Flash is best for large scale processing, low-latency, high volume tasks that require thinking, and agentic use cases. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-flash save Supported data types Inputs Text, images, video, audio Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Audio generation Not supported Caching Supported Code execution Supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Supported Thinking Supported Batch API Supported URL Context Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-flash Preview: gemini-2.5-flash-preview-05-20 calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash-Lite A Gemini 2.5 Flash model optimized for cost-efficiency and high throughput. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-flash-lite save Supported data types Inputs Text, image, video, audio, PDF Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 65,536 handyman Capabilities Structured outputs Supported Caching Supported Function calling Supported Code execution Supported URL Context Supported Search grounding Supported Image generation Not supported Audio generation Not supported Live API Not supported Thinking Supported Batch API Supported URL Context Supported 123 Versions Read the model version patterns for more details. Stable: gemini-2.5-flash-lite Preview: gemini-2.5-flash-lite-06-17 calendar_month Latest update July 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Live The Gemini 2.5 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code gemini-live-2.5-flash-preview save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported Batch API Not supported URL context Supported 123 Versions Read the model version patterns for more details. Preview: gemini-live-2.5-flash-preview calendar_month Latest update June 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Native Audio Our native audio dialog models, with and without thinking, available through the Live API . These models provide interactive and unstructured conversational experiences, with style and control prompting. Try native audio in Google AI Studio Model details Property Description id_card Model code gemini-2.5-flash-preview-native-audio-dialog & gemini-2.5-flash-exp-native-audio-thinking-dialog save Supported data types Inputs Audio, video, text Output Audio and text token_auto Token limits [*] Input token limit 128,000 Output token limit 8,000 handyman Capabilities Audio generation Supported Caching Not supported Code execution Not supported Function calling Supported Image generation Not supported Search grounding Supported Structured outputs Not supported Thinking Supported Batch API Not supported Tuning Not supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-preview-05-20 Experimental: gemini-2.5-flash-exp-native-audio-thinking-dialog calendar_month Latest update May 2025 cognition_2 Knowledge cutoff January 2025 Gemini 2.5 Flash Image Preview Gemini 2.5 Flash Image Preview is our latest, fastest, and most efficient natively multimodal model that lets you generate and edit images conversationally. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-flash-image-preview save Supported data types Inputs Images and text Output Images and text token_auto Token limits [*] Input token limit 32,768 Output token limit 32,768 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Not supported Code execution Not Supported Search Not Supported Image generation Supported Audio generation Not supported Live API Not Supported Thinking Not Supported Batch API Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.5-flash-image-preview calendar_month Latest update August 2025 cognition_2 Knowledge cutoff June 2025 Gemini 2.5 Flash Preview Text-to-Speech Gemini 2.5 Flash Preview TTS is our price-performant text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Flash rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-flash-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported Batch API Supported 123 Versions Read the model version patterns for more details. gemini-2.5-flash-preview-tts calendar_month Latest update May 2025 Gemini 2.5 Pro Preview Text-to-Speech Gemini 2.5 Pro Preview TTS is our most powerful text-to-speech model, delivering high control and transparency for structured workflows like podcast generation, audiobooks, customer support, and more. Gemini 2.5 Pro rate limits are more restricted since it is an experimental / preview model. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.5-pro-preview-tts save Supported data types Inputs Text Output Audio token_auto Token limits [*] Input token limit 8,000 Output token limit 16,000 handyman Capabilities Structured outputs Not supported Caching Not supported Tuning Not supported Function calling Not supported Code execution Not supported Search Not supported Audio generation Supported Live API Not supported Thinking Not supported Batch API Supported 123 Versions Read the model version patterns for more details. gemini-2.5-pro-preview-tts calendar_month Latest update May 2025 Gemini 2.0 Flash Gemini 2.0 Flash delivers next-gen features and improved capabilities, including superior speed, native tool use, and a 1M token context window. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.0-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Not supported Live API Supported Thinking Experimental Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash Stable: gemini-2.0-flash-001 Experimental: gemini-2.0-flash-exp calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash Preview Image Generation Gemini 2.0 Flash Preview Image Generation delivers improved image generation features, including generating and editing images conversationally. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.0-flash-preview-image-generation save Supported data types Inputs Audio, images, video, and text Output Text and images token_auto Token limits [*] Input token limit 32,000 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Not supported Code execution Not Supported Search Not Supported Image generation Supported Audio generation Not supported Live API Not Supported Thinking Not Supported Batch API Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-preview-image-generation gemini-2.0-flash-preview-image-generation is not currently supported in a number of countries in Europe, Middle East & Africa calendar_month Latest update May 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash-Lite A Gemini 2.0 Flash model optimized for cost efficiency and low latency. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.0-flash-lite save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Caching Supported Tuning Not supported Function calling Supported Code execution Not supported Search Not supported Image generation Not supported Audio generation Not supported Live API Not supported Batch API Supported 123 Versions Read the model version patterns for more details. Latest: gemini-2.0-flash-lite Stable: gemini-2.0-flash-lite-001 calendar_month Latest update February 2025 cognition_2 Knowledge cutoff August 2024 Gemini 2.0 Flash Live The Gemini 2.0 Flash Live model works with the Live API to enable low-latency bidirectional voice and video interactions with Gemini. The model can process text, audio, and video input, and it can provide text and audio output. Try in Google AI Studio Model details Property Description id_card Model code gemini-2.0-flash-live-001 save Supported data types Inputs Audio, video, and text Output Text, and audio token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 handyman Capabilities Structured outputs Supported Tuning Not supported Function calling Supported Code execution Supported Search Supported Image generation Not supported Audio generation Supported Thinking Not supported Batch API Not supported URL context Supported 123 Versions Read the model version patterns for more details. Preview: gemini-2.0-flash-live-001 calendar_month Latest update April 2025 cognition_2 Knowledge cutoff August 2024 Gemini 1.5 Flash Gemini 1.5 Flash is a fast and versatile multimodal model for scaling across diverse tasks. Model details Property Description id_card Model code gemini-1.5-flash save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-latest Latest stable: gemini-1.5-flash Stable: gemini-1.5-flash-001 gemini-1.5-flash-002 calendar_month Deprecation date September 2025 calendar_month Latest update September 2024 Gemini 1.5 Flash-8B Gemini 1.5 Flash-8B is a small model designed for lower intelligence tasks. Model details Property Description id_card Model code gemini-1.5-flash-8b save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 1,048,576 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 3,600 Maximum video length 1 hour Maximum audio length Approximately 9.5 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-flash-8b-latest Latest stable: gemini-1.5-flash-8b Stable: gemini-1.5-flash-8b-001 calendar_month Deprecation date September 2025 calendar_month Latest update October 2024 Gemini 1.5 Pro Try Gemini 2.5 Pro Preview , our most advanced Gemini model to date. Gemini 1.5 Pro is a mid-size multimodal model that is optimized for a wide-range of reasoning tasks. 1.5 Pro can process large amounts of data at once, including 2 hours of video, 19 hours of audio, codebases with 60,000 lines of code, or 2,000 pages of text. Model details Property Description id_card Model code gemini-1.5-pro save Supported data types Inputs Audio, images, video, and text Output Text token_auto Token limits [*] Input token limit 2,097,152 Output token limit 8,192 movie_info Audio/visual specs Maximum number of images per prompt 7,200 Maximum video length 2 hours Maximum audio length Approximately 19 hours handyman Capabilities System instructions Supported JSON mode Supported JSON schema Supported Adjustable safety settings Supported Caching Supported Tuning Not supported Function calling Supported Code execution Supported Live API Not supported 123 Versions Read the model version patterns for more details. Latest: gemini-1.5-pro-latest Latest stable: gemini-1.5-pro Stable: gemini-1.5-pro-001 gemini-1.5-pro-002 calendar_month Deprecation date September 2025 calendar_month Latest update September 2024 See the examples to explore the capabilities of these model variations. [*] A token is equivalent to about 4 characters for Gemini models. 100 tokens are about 60-80 English words. Model version name patterns Gemini models are available in either stable , preview , or experimental versions. In your code, you can use one of the following model name formats to specify which model and version you want to use. Latest stable Points to the most recent stable version released for the specified model generation and variation. To specify the latest stable version, use the following pattern: <model>-<generation>-<variation> . For example, gemini-2.0-flash . Stable Points to a specific stable model. Stable models usually don't change. Most production apps should use a specific stable model. To specify a stable version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-flash-001 . Preview Points to a preview model which may not be suitable for production use, come with more restrictive rate limits, but may have billing enabled. To specify a preview version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.5-pro-preview-06-05 . Preview models are not stable and availability of model endpoints is subject to change. Experimental Points to an experimental model which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback and get our latest updates into the hands of developers quickly. To specify an experimental version, use the following pattern: <model>-<generation>-<variation>-<version> . For example, gemini-2.0-pro-exp-02-05 . Experimental models are not stable and availability of model endpoints is subject to change. Experimental models In addition to stable models, the Gemini API offers experimental models which may not be suitable for production use and come with more restrictive rate limits. We release experimental models to gather feedback, get our latest updates into the hands of developers quickly, and highlight the pace of innovation happening at Google. What we learn from experimental launches informs how we release models more widely. An experimental model can be swapped for another without prior notice. We don't guarantee that an experimental model will become a stable model in the future. Previous experimental models As new versions or stable releases become available, we remove and replace experimental models. You can find the previous experimental models we released in the following section along with the replacement version: Model code Base model Replacement version gemini-embedding-exp-03-07 Gemini Embedding gemini-embedding-001 gemini-2.5-flash-preview-04-17 Gemini 2.5 Flash gemini-2.5-flash-preview-05-20 gemini-2.0-flash-exp-image-generation Gemini 2.0 Flash gemini-2.0-flash-preview-image-generation gemini-2.5-pro-preview-06-05 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-05-06 Gemini 2.5 Pro gemini-2.5-pro gemini-2.5-pro-preview-03-25 Gemini 2.5 Pro gemini-2.5-pro gemini-2.0-flash-thinking-exp-01-21 Gemini 2.5 Flash gemini-2.5-flash-preview-04-17 gemini-2.0-pro-exp-02-05 Gemini 2.0 Pro Experimental gemini-2.5-pro-preview-03-25 gemini-2.0-flash-exp Gemini 2.0 Flash gemini-2.0-flash gemini-exp-1206 Gemini 2.0 Pro gemini-2.0-pro-exp-02-05 gemini-2.0-flash-thinking-exp-1219 Gemini 2.0 Flash Thinking gemini-2.0-flash-thinking-exp-01-21 gemini-exp-1121 Gemini gemini-exp-1206 gemini-exp-1114 Gemini gemini-exp-1206 gemini-1.5-pro-exp-0827 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-pro-exp-0801 Gemini 1.5 Pro gemini-exp-1206 gemini-1.5-flash-8b-exp-0924 Gemini 1.5 Flash-8B gemini-1.5-flash-8b gemini-1.5-flash-8b-exp-0827 Gemini 1.5 Flash-8B gemini-1.5-flash-8b Supported languages Gemini models are trained to work with the following languages: Arabic ( ar ) Bengali ( bn ) Bulgarian ( bg ) Chinese simplified and traditional ( zh ) Croatian ( hr ) Czech ( cs ) Danish ( da ) Dutch ( nl ) English ( en ) Estonian ( et ) Finnish ( fi ) French ( fr ) German ( de ) Greek ( el ) Hebrew ( iw ) Hindi ( hi ) Hungarian ( hu ) Indonesian ( id ) Italian ( it ) Japanese ( ja ) Korean ( ko ) Latvian ( lv ) Lithuanian ( lt ) Norwegian ( no ) Polish ( pl ) Portuguese ( pt ) Romanian ( ro ) Russian ( ru ) Serbian ( sr ) Slovak ( sk ) Slovenian ( sl ) Spanish ( es ) Swahili ( sw ) Swedish ( sv ) Thai ( th ) Turkish ( tr ) Ukrainian ( uk ) Vietnamese ( vi ) Send feedback Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates. Last updated 2025-09-14 UTC.