MuhammadSaad16's picture
Upload 112 files
971b4ea verified

Research: Gemini API Migration

Feature Branch: 006-gemini-api-migration Created: 2025-12-14

Executive Summary

This research consolidates findings for migrating from OpenAI to Google Gemini API. The migration requires using the new google-genai SDK (not the deprecated google-generativeai), with specific patterns for async operations.

Research Findings

1. SDK Selection

Decision: Use google-genai package (new unified SDK)

Rationale:

  • The old google-generativeai package is deprecated
  • New SDK provides unified interface for all Google AI services
  • Better async support via client.aio namespace
  • Cleaner architecture with centralized Client object

Alternatives Considered:

  • google-generativeai (deprecated, not recommended)
  • Direct REST API calls (more complexity, no benefit)

Sources:

2. Model Selection

Decision: Use gemini-2.0-flash-exp for chat/translation/personalization

Rationale:

  • Explicitly requested by user
  • Experimental model with latest capabilities
  • Fast response times suitable for interactive chat

Decision: Use text-embedding-004 for embeddings

Rationale:

  • Explicitly requested by user
  • Available via Gemini API
  • Note: gemini-embedding-001 is newer (3072 dimensions) but user specified text-embedding-004

Sources:

3. Async Pattern

Decision: Use client.aio.models.generate_content() for async operations

Rationale:

  • Current codebase uses asyncio.to_thread() for OpenAI calls
  • New Gemini SDK has native async support via client.aio namespace
  • Cleaner than wrapping sync calls in thread pool

Implementation Pattern:

from google import genai

client = genai.Client()

# Async generate content
response = await client.aio.models.generate_content(
    model='gemini-2.0-flash-exp',
    contents='...'
)
print(response.text)

Sources:

4. Conversation History Format

Decision: Map OpenAI message format to Gemini contents format

OpenAI Format (current):

messages = [
    {"role": "system", "content": "..."},
    {"role": "user", "content": "..."},
    {"role": "assistant", "content": "..."}
]

Gemini Format (target):

contents = [
    types.Content(role="user", parts=[types.Part(text="...")]),
    types.Content(role="model", parts=[types.Part(text="...")])
]

Key Differences:

  • Gemini uses "model" instead of "assistant"
  • System prompts should be prepended to first user message or use system_instruction config
  • Parts structure for multi-modal support

Implementation Strategy:

  • Use system_instruction parameter for system prompts
  • Convert history format in get_chat_response method

5. JSON Response Format

Decision: Use response_mime_type for JSON output in personalization

Rationale:

  • OpenAI uses response_format={"type": "json_object"}
  • Gemini uses config.response_mime_type="application/json"

Implementation Pattern:

response = await client.aio.models.generate_content(
    model='gemini-2.0-flash-exp',
    contents='...',
    config=types.GenerateContentConfig(
        response_mime_type="application/json"
    )
)

6. Embeddings Implementation

Decision: Use client.models.embed_content() for embeddings

Implementation Pattern:

from google import genai
from google.genai import types

client = genai.Client()

result = client.models.embed_content(
    model="text-embedding-004",
    contents=text,
    config=types.EmbedContentConfig(task_type="RETRIEVAL_DOCUMENT")
)
embedding = result.embeddings[0].values

Embedding Dimensions:

  • text-embedding-004: 768 dimensions
  • gemini-embedding-001: 3072 dimensions (default), configurable

Qdrant Compatibility Note:

  • OpenAI text-embedding-3-small produces 1536-dimensional vectors
  • text-embedding-004 produces 768-dimensional vectors
  • Existing Qdrant collections will need re-indexing (out of scope per spec)

7. API Key Configuration

Decision: Environment variable GEMINI_API_KEY

Rationale:

  • SDK auto-reads from GEMINI_API_KEY environment variable
  • Consistent with existing pattern (OPENAI_API_KEY → GEMINI_API_KEY)

Implementation:

# SDK reads GEMINI_API_KEY automatically
client = genai.Client()

# Or explicitly:
client = genai.Client(api_key=settings.GEMINI_API_KEY)

8. Error Handling

Decision: Map Gemini exceptions to existing HTTP error patterns

Gemini Exception HTTP Code Current OpenAI Pattern
google.api_core.exceptions.InvalidArgument 400 Validation errors
google.api_core.exceptions.ResourceExhausted 429 Rate limiting
google.api_core.exceptions.ServiceUnavailable 503 Service unavailable
google.api_core.exceptions.GoogleAPIError 500 Generic error

Technical Decisions Summary

Aspect Decision Confidence
SDK Package google-genai High
Chat Model gemini-2.0-flash-exp High (user specified)
Embedding Model text-embedding-004 High (user specified)
Async Pattern client.aio.models.* High
JSON Output response_mime_type High
System Prompts system_instruction config High
API Key GEMINI_API_KEY env var High

Risk Assessment

Risk Likelihood Impact Mitigation
Embedding dimension mismatch High High Document re-indexing requirement
Model availability (experimental) Medium Medium Monitor for stable release
Response format differences Low Low Thorough testing
Rate limit differences Low Medium Monitor and adjust if needed

References