Spaces:
Configuration error
Research: Gemini API Migration
Feature Branch: 006-gemini-api-migration
Created: 2025-12-14
Executive Summary
This research consolidates findings for migrating from OpenAI to Google Gemini API. The migration requires using the new google-genai SDK (not the deprecated google-generativeai), with specific patterns for async operations.
Research Findings
1. SDK Selection
Decision: Use google-genai package (new unified SDK)
Rationale:
- The old
google-generativeaipackage is deprecated - New SDK provides unified interface for all Google AI services
- Better async support via
client.aionamespace - Cleaner architecture with centralized Client object
Alternatives Considered:
google-generativeai(deprecated, not recommended)- Direct REST API calls (more complexity, no benefit)
Sources:
2. Model Selection
Decision: Use gemini-2.0-flash-exp for chat/translation/personalization
Rationale:
- Explicitly requested by user
- Experimental model with latest capabilities
- Fast response times suitable for interactive chat
Decision: Use text-embedding-004 for embeddings
Rationale:
- Explicitly requested by user
- Available via Gemini API
- Note:
gemini-embedding-001is newer (3072 dimensions) but user specified text-embedding-004
Sources:
3. Async Pattern
Decision: Use client.aio.models.generate_content() for async operations
Rationale:
- Current codebase uses
asyncio.to_thread()for OpenAI calls - New Gemini SDK has native async support via
client.aionamespace - Cleaner than wrapping sync calls in thread pool
Implementation Pattern:
from google import genai
client = genai.Client()
# Async generate content
response = await client.aio.models.generate_content(
model='gemini-2.0-flash-exp',
contents='...'
)
print(response.text)
Sources:
4. Conversation History Format
Decision: Map OpenAI message format to Gemini contents format
OpenAI Format (current):
messages = [
{"role": "system", "content": "..."},
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."}
]
Gemini Format (target):
contents = [
types.Content(role="user", parts=[types.Part(text="...")]),
types.Content(role="model", parts=[types.Part(text="...")])
]
Key Differences:
- Gemini uses "model" instead of "assistant"
- System prompts should be prepended to first user message or use system_instruction config
- Parts structure for multi-modal support
Implementation Strategy:
- Use
system_instructionparameter for system prompts - Convert history format in
get_chat_responsemethod
5. JSON Response Format
Decision: Use response_mime_type for JSON output in personalization
Rationale:
- OpenAI uses
response_format={"type": "json_object"} - Gemini uses
config.response_mime_type="application/json"
Implementation Pattern:
response = await client.aio.models.generate_content(
model='gemini-2.0-flash-exp',
contents='...',
config=types.GenerateContentConfig(
response_mime_type="application/json"
)
)
6. Embeddings Implementation
Decision: Use client.models.embed_content() for embeddings
Implementation Pattern:
from google import genai
from google.genai import types
client = genai.Client()
result = client.models.embed_content(
model="text-embedding-004",
contents=text,
config=types.EmbedContentConfig(task_type="RETRIEVAL_DOCUMENT")
)
embedding = result.embeddings[0].values
Embedding Dimensions:
- text-embedding-004: 768 dimensions
- gemini-embedding-001: 3072 dimensions (default), configurable
Qdrant Compatibility Note:
- OpenAI text-embedding-3-small produces 1536-dimensional vectors
- text-embedding-004 produces 768-dimensional vectors
- Existing Qdrant collections will need re-indexing (out of scope per spec)
7. API Key Configuration
Decision: Environment variable GEMINI_API_KEY
Rationale:
- SDK auto-reads from
GEMINI_API_KEYenvironment variable - Consistent with existing pattern (OPENAI_API_KEY → GEMINI_API_KEY)
Implementation:
# SDK reads GEMINI_API_KEY automatically
client = genai.Client()
# Or explicitly:
client = genai.Client(api_key=settings.GEMINI_API_KEY)
8. Error Handling
Decision: Map Gemini exceptions to existing HTTP error patterns
| Gemini Exception | HTTP Code | Current OpenAI Pattern |
|---|---|---|
| google.api_core.exceptions.InvalidArgument | 400 | Validation errors |
| google.api_core.exceptions.ResourceExhausted | 429 | Rate limiting |
| google.api_core.exceptions.ServiceUnavailable | 503 | Service unavailable |
| google.api_core.exceptions.GoogleAPIError | 500 | Generic error |
Technical Decisions Summary
| Aspect | Decision | Confidence |
|---|---|---|
| SDK Package | google-genai | High |
| Chat Model | gemini-2.0-flash-exp | High (user specified) |
| Embedding Model | text-embedding-004 | High (user specified) |
| Async Pattern | client.aio.models.* | High |
| JSON Output | response_mime_type | High |
| System Prompts | system_instruction config | High |
| API Key | GEMINI_API_KEY env var | High |
Risk Assessment
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Embedding dimension mismatch | High | High | Document re-indexing requirement |
| Model availability (experimental) | Medium | Medium | Monitor for stable release |
| Response format differences | Low | Low | Thorough testing |
| Rate limit differences | Low | Medium | Monitor and adjust if needed |