File size: 3,205 Bytes

cfcbbc8

# Model Name Updates in five_step_analysis.ipynb

## Changes Made

Updated the notebook to display cleaner, more readable model names in all plots while maintaining the correct cost lookups.

## Before → After Transformations

| Original Name | Display Name (with Version Date) |
|---------------|----------------------------------|
| `anthropic/claude-haiku:latest` | **Claude Haiku 4.5 (2025-10-01)** |
| `anthropic/claude-opus:latest` | **Claude Opus 4.1 (2025-08-05)** |
| `anthropic/claude-sonnet:latest` | **Claude Sonnet 4.5 (2025-09-29)** |
| `claude-3-5-haiku-latest` | **Claude 3.5 Haiku (2024-10-22)** |
| `google/gemini:latest` | **Gemini 2.5 Pro** |
| `google/gemini-flash` | **Gemini Flash** |
| `gemini-2.0-flash-lite` | **Gemini 2.0 Flash Lite** |
| `openai/o:latest` | **O3 (2025-04-16, Azure)** |
| `openai/gpt-5` | **GPT-5 (2025-08-07)** |
| `openai/gpt-5-mini` | **GPT-5 Mini** |
| `openai/o3` | **O3** |
| `openai/o3-mini` | **O3 Mini** |
| `openai/o4-mini` | **O4 Mini** |
| `xai/grok:latest` | **Grok-3** |
| `xai/grok-mini` | **Grok Mini** |
| `xai/grok-code-fast-1` | **Grok Code Fast 1** |
| `aws/llama-4-maverick` | **Llama-4 Maverick** |
| `aws/llama-4-scout` | **Llama-4 Scout** |
| `gpt-oss-120b` | **GPT-OSS-120B** |
| `gpt-5-codex` | **GPT-5 Codex** |
| `deepseek-r1` | **DeepSeek-R1** |
| `gcp/qwen-3` | **Qwen-3** |

**Note:** Version dates (e.g., 2025-10-01) reflect the actual underlying model versions discovered through CBORG API testing on October 29, 2025.

## Technical Implementation

### What Changed
- Added `MODEL_NAME_MAPPING` dictionary based on CBORG API testing results
- Added `resolve_model_name()` function to convert aliases to display names
- Updated `create_pair_label()` to use resolved names instead of raw strings

### What Stayed the Same
- Cost tables still use original model names (correct behavior)
- Data loading and filtering logic unchanged
- Plot generation code unchanged
- Cost calculations work correctly with original column values

### Key Design Decision
The mapping only affects the `pair` column used for display in plots. The original `supervisor` and `coder` columns remain unchanged, ensuring cost lookups continue to work correctly:

```python
# Cost lookup uses original columns (correct)
sup_model = row['supervisor']  # e.g., "anthropic/claude-haiku:latest"
sup_icost = input_cost.get(sup_model, 0)  # Finds correct price

# Display uses mapped pair column
pair_name = row['pair']  # e.g., "Claude Haiku 4.5"
```

## Benefits

1. **Clearer plot titles**: "Claude Haiku 4.5" instead of "anthropic/claude-haiku:latest"
2. **Easier comparison**: Names highlight the actual model versions
3. **Based on real data**: Names reflect actual underlying models from CBORG API testing
4. **Maintains correctness**: Cost calculations still work properly with original names

## Example Output

Before:
- `anthropic/claude-sonnet:latest`
- `xai/grok:latest`
- `openai/o:latest`
- `openai/gpt-5`

After (with version dates):
- `Claude Sonnet 4.5 (2025-09-29)`
- `Grok-3`
- `O3 (2025-04-16, Azure)`
- `GPT-5 (2025-08-07)`

Much more readable in plot titles and legends, with version dates showing exactly which model snapshot was used!