HWresearch
/

LLM4HEP

Text Generation

Model card Files Files and versions

LLM4HEP / MODEL_NAME_UPDATES.md

ho22joshua's picture

initial commit

cfcbbc8 3 months ago

|

history blame contribute delete

3.21 kB

	# Model Name Updates in five_step_analysis.ipynb

	## Changes Made

	Updated the notebook to display cleaner, more readable model names in all plots while maintaining the correct cost lookups.

	## Before → After Transformations

	\| Original Name \| Display Name (with Version Date) \|
	\|---------------\|----------------------------------\|
	\| `anthropic/claude-haiku:latest` \| Claude Haiku 4.5 (2025-10-01) \|
	\| `anthropic/claude-opus:latest` \| Claude Opus 4.1 (2025-08-05) \|
	\| `anthropic/claude-sonnet:latest` \| Claude Sonnet 4.5 (2025-09-29) \|
	\| `claude-3-5-haiku-latest` \| Claude 3.5 Haiku (2024-10-22) \|
	\| `google/gemini:latest` \| Gemini 2.5 Pro \|
	\| `google/gemini-flash` \| Gemini Flash \|
	\| `gemini-2.0-flash-lite` \| Gemini 2.0 Flash Lite \|
	\| `openai/o:latest` \| O3 (2025-04-16, Azure) \|
	\| `openai/gpt-5` \| GPT-5 (2025-08-07) \|
	\| `openai/gpt-5-mini` \| GPT-5 Mini \|
	\| `openai/o3` \| O3 \|
	\| `openai/o3-mini` \| O3 Mini \|
	\| `openai/o4-mini` \| O4 Mini \|
	\| `xai/grok:latest` \| Grok-3 \|
	\| `xai/grok-mini` \| Grok Mini \|
	\| `xai/grok-code-fast-1` \| Grok Code Fast 1 \|
	\| `aws/llama-4-maverick` \| Llama-4 Maverick \|
	\| `aws/llama-4-scout` \| Llama-4 Scout \|
	\| `gpt-oss-120b` \| GPT-OSS-120B \|
	\| `gpt-5-codex` \| GPT-5 Codex \|
	\| `deepseek-r1` \| DeepSeek-R1 \|
	\| `gcp/qwen-3` \| Qwen-3 \|

	Note: Version dates (e.g., 2025-10-01) reflect the actual underlying model versions discovered through CBORG API testing on October 29, 2025.

	## Technical Implementation

	### What Changed
	- Added `MODEL_NAME_MAPPING` dictionary based on CBORG API testing results
	- Added `resolve_model_name()` function to convert aliases to display names
	- Updated `create_pair_label()` to use resolved names instead of raw strings

	### What Stayed the Same
	- Cost tables still use original model names (correct behavior)
	- Data loading and filtering logic unchanged
	- Plot generation code unchanged
	- Cost calculations work correctly with original column values

	### Key Design Decision
	The mapping only affects the `pair` column used for display in plots. The original `supervisor` and `coder` columns remain unchanged, ensuring cost lookups continue to work correctly:

	```python
	# Cost lookup uses original columns (correct)
	sup_model = row['supervisor'] # e.g., "anthropic/claude-haiku:latest"
	sup_icost = input_cost.get(sup_model, 0) # Finds correct price

	# Display uses mapped pair column
	pair_name = row['pair'] # e.g., "Claude Haiku 4.5"
	```

	## Benefits

	1. Clearer plot titles: "Claude Haiku 4.5" instead of "anthropic/claude-haiku:latest"
	2. Easier comparison: Names highlight the actual model versions
	3. Based on real data: Names reflect actual underlying models from CBORG API testing
	4. Maintains correctness: Cost calculations still work properly with original names

	## Example Output

	Before:
	- `anthropic/claude-sonnet:latest`
	- `xai/grok:latest`
	- `openai/o:latest`
	- `openai/gpt-5`

	After (with version dates):
	- `Claude Sonnet 4.5 (2025-09-29)`
	- `Grok-3`
	- `O3 (2025-04-16, Azure)`
	- `GPT-5 (2025-08-07)`

	Much more readable in plot titles and legends, with version dates showing exactly which model snapshot was used!