LLM4HEP / MODEL_NAME_UPDATES.md
ho22joshua's picture
initial commit
cfcbbc8
# Model Name Updates in five_step_analysis.ipynb
## Changes Made
Updated the notebook to display cleaner, more readable model names in all plots while maintaining the correct cost lookups.
## Before → After Transformations
| Original Name | Display Name (with Version Date) |
|---------------|----------------------------------|
| `anthropic/claude-haiku:latest` | **Claude Haiku 4.5 (2025-10-01)** |
| `anthropic/claude-opus:latest` | **Claude Opus 4.1 (2025-08-05)** |
| `anthropic/claude-sonnet:latest` | **Claude Sonnet 4.5 (2025-09-29)** |
| `claude-3-5-haiku-latest` | **Claude 3.5 Haiku (2024-10-22)** |
| `google/gemini:latest` | **Gemini 2.5 Pro** |
| `google/gemini-flash` | **Gemini Flash** |
| `gemini-2.0-flash-lite` | **Gemini 2.0 Flash Lite** |
| `openai/o:latest` | **O3 (2025-04-16, Azure)** |
| `openai/gpt-5` | **GPT-5 (2025-08-07)** |
| `openai/gpt-5-mini` | **GPT-5 Mini** |
| `openai/o3` | **O3** |
| `openai/o3-mini` | **O3 Mini** |
| `openai/o4-mini` | **O4 Mini** |
| `xai/grok:latest` | **Grok-3** |
| `xai/grok-mini` | **Grok Mini** |
| `xai/grok-code-fast-1` | **Grok Code Fast 1** |
| `aws/llama-4-maverick` | **Llama-4 Maverick** |
| `aws/llama-4-scout` | **Llama-4 Scout** |
| `gpt-oss-120b` | **GPT-OSS-120B** |
| `gpt-5-codex` | **GPT-5 Codex** |
| `deepseek-r1` | **DeepSeek-R1** |
| `gcp/qwen-3` | **Qwen-3** |
**Note:** Version dates (e.g., 2025-10-01) reflect the actual underlying model versions discovered through CBORG API testing on October 29, 2025.
## Technical Implementation
### What Changed
- Added `MODEL_NAME_MAPPING` dictionary based on CBORG API testing results
- Added `resolve_model_name()` function to convert aliases to display names
- Updated `create_pair_label()` to use resolved names instead of raw strings
### What Stayed the Same
- Cost tables still use original model names (correct behavior)
- Data loading and filtering logic unchanged
- Plot generation code unchanged
- Cost calculations work correctly with original column values
### Key Design Decision
The mapping only affects the `pair` column used for display in plots. The original `supervisor` and `coder` columns remain unchanged, ensuring cost lookups continue to work correctly:
```python
# Cost lookup uses original columns (correct)
sup_model = row['supervisor'] # e.g., "anthropic/claude-haiku:latest"
sup_icost = input_cost.get(sup_model, 0) # Finds correct price
# Display uses mapped pair column
pair_name = row['pair'] # e.g., "Claude Haiku 4.5"
```
## Benefits
1. **Clearer plot titles**: "Claude Haiku 4.5" instead of "anthropic/claude-haiku:latest"
2. **Easier comparison**: Names highlight the actual model versions
3. **Based on real data**: Names reflect actual underlying models from CBORG API testing
4. **Maintains correctness**: Cost calculations still work properly with original names
## Example Output
Before:
- `anthropic/claude-sonnet:latest`
- `xai/grok:latest`
- `openai/o:latest`
- `openai/gpt-5`
After (with version dates):
- `Claude Sonnet 4.5 (2025-09-29)`
- `Grok-3`
- `O3 (2025-04-16, Azure)`
- `GPT-5 (2025-08-07)`
Much more readable in plot titles and legends, with version dates showing exactly which model snapshot was used!