| # Model Name Updates in five_step_analysis.ipynb | |
| ## Changes Made | |
| Updated the notebook to display cleaner, more readable model names in all plots while maintaining the correct cost lookups. | |
| ## Before → After Transformations | |
| | Original Name | Display Name (with Version Date) | | |
| |---------------|----------------------------------| | |
| | `anthropic/claude-haiku:latest` | **Claude Haiku 4.5 (2025-10-01)** | | |
| | `anthropic/claude-opus:latest` | **Claude Opus 4.1 (2025-08-05)** | | |
| | `anthropic/claude-sonnet:latest` | **Claude Sonnet 4.5 (2025-09-29)** | | |
| | `claude-3-5-haiku-latest` | **Claude 3.5 Haiku (2024-10-22)** | | |
| | `google/gemini:latest` | **Gemini 2.5 Pro** | | |
| | `google/gemini-flash` | **Gemini Flash** | | |
| | `gemini-2.0-flash-lite` | **Gemini 2.0 Flash Lite** | | |
| | `openai/o:latest` | **O3 (2025-04-16, Azure)** | | |
| | `openai/gpt-5` | **GPT-5 (2025-08-07)** | | |
| | `openai/gpt-5-mini` | **GPT-5 Mini** | | |
| | `openai/o3` | **O3** | | |
| | `openai/o3-mini` | **O3 Mini** | | |
| | `openai/o4-mini` | **O4 Mini** | | |
| | `xai/grok:latest` | **Grok-3** | | |
| | `xai/grok-mini` | **Grok Mini** | | |
| | `xai/grok-code-fast-1` | **Grok Code Fast 1** | | |
| | `aws/llama-4-maverick` | **Llama-4 Maverick** | | |
| | `aws/llama-4-scout` | **Llama-4 Scout** | | |
| | `gpt-oss-120b` | **GPT-OSS-120B** | | |
| | `gpt-5-codex` | **GPT-5 Codex** | | |
| | `deepseek-r1` | **DeepSeek-R1** | | |
| | `gcp/qwen-3` | **Qwen-3** | | |
| **Note:** Version dates (e.g., 2025-10-01) reflect the actual underlying model versions discovered through CBORG API testing on October 29, 2025. | |
| ## Technical Implementation | |
| ### What Changed | |
| - Added `MODEL_NAME_MAPPING` dictionary based on CBORG API testing results | |
| - Added `resolve_model_name()` function to convert aliases to display names | |
| - Updated `create_pair_label()` to use resolved names instead of raw strings | |
| ### What Stayed the Same | |
| - Cost tables still use original model names (correct behavior) | |
| - Data loading and filtering logic unchanged | |
| - Plot generation code unchanged | |
| - Cost calculations work correctly with original column values | |
| ### Key Design Decision | |
| The mapping only affects the `pair` column used for display in plots. The original `supervisor` and `coder` columns remain unchanged, ensuring cost lookups continue to work correctly: | |
| ```python | |
| # Cost lookup uses original columns (correct) | |
| sup_model = row['supervisor'] # e.g., "anthropic/claude-haiku:latest" | |
| sup_icost = input_cost.get(sup_model, 0) # Finds correct price | |
| # Display uses mapped pair column | |
| pair_name = row['pair'] # e.g., "Claude Haiku 4.5" | |
| ``` | |
| ## Benefits | |
| 1. **Clearer plot titles**: "Claude Haiku 4.5" instead of "anthropic/claude-haiku:latest" | |
| 2. **Easier comparison**: Names highlight the actual model versions | |
| 3. **Based on real data**: Names reflect actual underlying models from CBORG API testing | |
| 4. **Maintains correctness**: Cost calculations still work properly with original names | |
| ## Example Output | |
| Before: | |
| - `anthropic/claude-sonnet:latest` | |
| - `xai/grok:latest` | |
| - `openai/o:latest` | |
| - `openai/gpt-5` | |
| After (with version dates): | |
| - `Claude Sonnet 4.5 (2025-09-29)` | |
| - `Grok-3` | |
| - `O3 (2025-04-16, Azure)` | |
| - `GPT-5 (2025-08-07)` | |
| Much more readable in plot titles and legends, with version dates showing exactly which model snapshot was used! | |