Spaces:
Running
Running
Adding interactive charts + assesment
Browse files- ASSESSMENT.md +291 -0
- app/src/content/assets/data/basic_metrics.csv +2 -2
- app/src/content/assets/data/by_rule.json +2 -2
- app/src/content/assets/data/by_rule.png +2 -2
- app/src/content/assets/data/complexity_analysis.json +2 -2
- app/src/content/assets/data/complexity_analysis.png +2 -2
- app/src/content/assets/data/model_claude_haiku_4_5.png +3 -0
- app/src/content/assets/data/model_claude_opus_4_5.png +3 -0
- app/src/content/assets/data/model_deepseek_r1.png +3 -0
- app/src/content/assets/data/model_gemini_3_flash_preview_low.png +3 -0
- app/src/content/assets/data/model_gpt_5_2_high.png +3 -0
- app/src/content/assets/data/model_gpt_5_mini_medium.png +3 -0
- app/src/content/assets/data/model_gpt_oss_120b.png +3 -0
- app/src/content/assets/data/model_gpt_oss_20b.png +3 -0
- app/src/content/assets/data/model_grok_4_1_fast_reasoning.png +3 -0
- app/src/content/assets/data/model_kimi_k2.png +3 -0
- app/src/content/assets/data/overall_performance.json +2 -2
- app/src/content/assets/data/overall_performance.png +2 -2
- app/src/content/assets/data/reckless_guessing.json +3 -0
- app/src/content/assets/data/reckless_guessing.png +3 -0
- app/src/content/assets/data/score_stack.json +3 -0
- app/src/content/assets/data/score_stack.png +3 -0
- app/src/content/assets/data/score_vs_failed_guesses.json +2 -2
- app/src/content/assets/data/score_vs_failed_guesses.png +2 -2
- app/src/content/assets/data/summary.txt +91 -51
- app/src/content/chapters/eleusis/benchmark.mdx +2 -2
- app/src/content/chapters/eleusis/results.mdx +59 -42
- app/src/content/embeds/by-rule.html +521 -0
- app/src/content/embeds/calibration-curves.html +537 -0
- app/src/content/embeds/caution-vs-failed-guesses.html +369 -0
- app/src/content/embeds/complexity-analysis.html +492 -0
- app/src/content/embeds/confidence-distribution.html +495 -0
- app/src/content/embeds/excess-caution.html +384 -0
- app/src/content/embeds/reckless-guessing.html +400 -0
- app/src/content/embeds/score-stack.html +440 -0
- app/src/content/embeds/score-vs-failed-guesses.html +369 -0
- dark-mode-image.md +48 -0
- interactive-charts.md +498 -0
ASSESSMENT.md
ADDED
|
@@ -0,0 +1,291 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Critical Assessment: Eleusis Benchmark Article
|
| 2 |
+
|
| 3 |
+
## Executive Summary
|
| 4 |
+
|
| 5 |
+
The article presents an interesting benchmark with solid methodology and rich data. The main structural issue is that the **Results section tells a fragmented story about guessing behavior**, spreading related insights across 6+ subsections without a clear narrative arc. The key message—that metacognition matters and models have distinct "scientific personalities"—gets lost in the noise.
|
| 6 |
+
|
| 7 |
+
Additionally, there are **data consistency issues** between the text and the underlying data files that need resolution before publication.
|
| 8 |
+
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
## 1. Critical Issues
|
| 12 |
+
|
| 13 |
+
### 1.1 Data Inconsistencies
|
| 14 |
+
|
| 15 |
+
The numbers in the text don't match `summary.txt`. For example:
|
| 16 |
+
|
| 17 |
+
| Metric | In Text | In summary.txt |
|
| 18 |
+
|--------|---------|----------------|
|
| 19 |
+
| Claude Opus 4.5 avg score | 15.88 (CLAUDE.md) | 14.46 |
|
| 20 |
+
| Kimi K2 avg score | 14.53 (CLAUDE.md) | 10.31 |
|
| 21 |
+
| GPT 5.2 High rank | "third place" | Actually 1st by avg_score (14.85) |
|
| 22 |
+
|
| 23 |
+
**Action needed:** Audit all numbers in the text against the latest data files.
|
| 24 |
+
|
| 25 |
+
### 1.2 Results Section: Scattered Narrative
|
| 26 |
+
|
| 27 |
+
The guessing behavior story is currently spread across:
|
| 28 |
+
|
| 29 |
+
1. "Confidence and Calibration" - calibration curves, confidence distribution
|
| 30 |
+
2. "Guessing Strategy" - score vs failed guesses
|
| 31 |
+
3. "The Caution-Recklessness Trade-off" - early correct turns, caution scatter
|
| 32 |
+
4. "Alternative Scoring Systems" - score stack breakdown
|
| 33 |
+
5. "Analysis of the reckless guessing behavior" - double-down rate
|
| 34 |
+
|
| 35 |
+
These all address the same fundamental question: **How do models decide when to commit?** But the current structure forces readers to piece together the story themselves.
|
| 36 |
+
|
| 37 |
+
**Problem:** A reader finishing the Results section doesn't have a clear mental model of "what makes some models better than others."
|
| 38 |
+
|
| 39 |
+
---
|
| 40 |
+
|
| 41 |
+
## 2. Suggested Restructuring
|
| 42 |
+
|
| 43 |
+
### Option A: Reorganize Around the Key Insight
|
| 44 |
+
|
| 45 |
+
**Proposed Results structure:**
|
| 46 |
+
|
| 47 |
+
```
|
| 48 |
+
## Results
|
| 49 |
+
|
| 50 |
+
### Overall Performance (keep as-is)
|
| 51 |
+
Brief overview, scatter plot of score vs tokens
|
| 52 |
+
|
| 53 |
+
### Finding the Rule: Who Gets It Right?
|
| 54 |
+
- Success rates by model
|
| 55 |
+
- Performance by rule complexity
|
| 56 |
+
- Brief: what capabilities matter for finding rules
|
| 57 |
+
|
| 58 |
+
### Knowing When You Know: The Metacognition Challenge
|
| 59 |
+
[This is the heart of the article - elevate it]
|
| 60 |
+
- The caution-recklessness trade-off (central framing)
|
| 61 |
+
- Caution analysis: early correct turns, GPT 5.2 waits too long
|
| 62 |
+
- Recklessness analysis: failed guesses, double-down rates
|
| 63 |
+
- The scatter plot showing the trade-off (Figure 6)
|
| 64 |
+
- Why Claude Opus wins: good enough at finding + great at timing
|
| 65 |
+
|
| 66 |
+
### Confidence and Calibration
|
| 67 |
+
- Calibration curves (all models overconfident)
|
| 68 |
+
- Confidence distribution when guessing
|
| 69 |
+
- Brief: why calibration enables good timing decisions
|
| 70 |
+
|
| 71 |
+
### Alternative Scoring: Robustness Check
|
| 72 |
+
- Score stack shows the penalty different behaviors pay
|
| 73 |
+
- Confirms that metacognition, not just rule-finding, drives scores
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
**Benefits:**
|
| 77 |
+
- The key message (metacognition matters) becomes structurally prominent
|
| 78 |
+
- Reader builds understanding progressively: first "can they solve it?", then "do they know when they've solved it?"
|
| 79 |
+
- Eliminates the feeling of "lots of charts, hard to synthesize"
|
| 80 |
+
|
| 81 |
+
### Option B: Two-Act Structure
|
| 82 |
+
|
| 83 |
+
```
|
| 84 |
+
## Results
|
| 85 |
+
|
| 86 |
+
### Act 1: The Leaderboard (compact)
|
| 87 |
+
- Overall performance scatter
|
| 88 |
+
- Success rates
|
| 89 |
+
- One paragraph summary: "Models vary from 70% to 96% success rate..."
|
| 90 |
+
|
| 91 |
+
### Act 2: The Real Story—Scientific Temperaments
|
| 92 |
+
[Frame models as having distinct "personalities"]
|
| 93 |
+
|
| 94 |
+
The Cautious Achiever: GPT 5.2 High
|
| 95 |
+
- Highest success rate, but 3rd in score
|
| 96 |
+
- Figure: excess caution distribution
|
| 97 |
+
- Lost ~3.6 points per round to over-caution
|
| 98 |
+
|
| 99 |
+
The Balanced Scientist: Claude Opus 4.5
|
| 100 |
+
- Not the best at finding rules, but best at knowing when
|
| 101 |
+
- Commits quickly, accepts occasional wrong guesses
|
| 102 |
+
|
| 103 |
+
The Reckless Guesser: Claude Haiku 4.5 / DeepSeek R1
|
| 104 |
+
- Commits before sufficient evidence
|
| 105 |
+
- Double-down behavior after failures
|
| 106 |
+
|
| 107 |
+
Visualizing the Trade-off
|
| 108 |
+
- Caution vs recklessness scatter (the key figure)
|
| 109 |
+
- Score stack showing what each "personality" costs
|
| 110 |
+
|
| 111 |
+
### Calibration: Why Timing Is Hard
|
| 112 |
+
- Overconfidence makes timing decisions unreliable
|
| 113 |
+
- Even well-performing models poorly calibrated
|
| 114 |
+
```
|
| 115 |
+
|
| 116 |
+
**Benefits:**
|
| 117 |
+
- Memorable framing (scientific personalities)
|
| 118 |
+
- Natural story arc
|
| 119 |
+
- Each model type is clearly characterized
|
| 120 |
+
|
| 121 |
+
---
|
| 122 |
+
|
| 123 |
+
## 3. Missing Content
|
| 124 |
+
|
| 125 |
+
### 3.1 Figures Marked as TODO
|
| 126 |
+
|
| 127 |
+
- **Learning curves figure** (analysis.mdx:22) - Would show within-round dynamics
|
| 128 |
+
- **Failure mode distribution** (analysis.mdx:55) - Stacked bar by model
|
| 129 |
+
|
| 130 |
+
**Recommendation:** The learning curves figure would be valuable if you have the data. The failure mode classification might be hard to automate reliably—consider whether a few qualitative examples serve the purpose better.
|
| 131 |
+
|
| 132 |
+
### 3.2 Human Baseline
|
| 133 |
+
|
| 134 |
+
Mentioned in limitations but this is a significant gap. Without human performance, readers can't judge if 92% success is impressive or trivial.
|
| 135 |
+
|
| 136 |
+
**Options:**
|
| 137 |
+
- Run a small human study (even N=5 would help)
|
| 138 |
+
- Cite related work on human performance in similar inductive reasoning tasks
|
| 139 |
+
- Frame it explicitly as "relative comparison between models" not absolute capability assessment
|
| 140 |
+
|
| 141 |
+
### 3.3 Example Turn Figure
|
| 142 |
+
|
| 143 |
+
benchmark.mdx shows the JSON output format but doesn't illustrate what a complete turn looks like in context (game state → reasoning → decision).
|
| 144 |
+
|
| 145 |
+
**Recommendation:** Add a figure showing:
|
| 146 |
+
```
|
| 147 |
+
[Current board state visualization]
|
| 148 |
+
[Model reasoning excerpt]
|
| 149 |
+
[Decision: play 4♣, confidence 6, don't guess yet]
|
| 150 |
+
[Outcome: accepted/rejected]
|
| 151 |
+
```
|
| 152 |
+
|
| 153 |
+
This makes the task concrete for readers.
|
| 154 |
+
|
| 155 |
+
---
|
| 156 |
+
|
| 157 |
+
## 4. The "Deeper Analysis" Section
|
| 158 |
+
|
| 159 |
+
Currently a grab-bag of interesting observations with TODOs. Your instinct to replace with "Discussion" is right.
|
| 160 |
+
|
| 161 |
+
### Proposed: Discussion Section
|
| 162 |
+
|
| 163 |
+
```
|
| 164 |
+
## Discussion
|
| 165 |
+
|
| 166 |
+
### What Explains the Performance Gap?
|
| 167 |
+
- Metacognition (knowing when you know) is the key differentiator
|
| 168 |
+
- Success rate alone doesn't predict score (GPT 5.2 vs Opus example)
|
| 169 |
+
- Calibration enables good timing, but no model is well-calibrated
|
| 170 |
+
|
| 171 |
+
### Open vs Proprietary Models
|
| 172 |
+
- Kimi K2 competitive on rule-finding
|
| 173 |
+
- But open models trend toward reckless guessing (training objective differences?)
|
| 174 |
+
- Opportunity: calibration tuning could improve open model performance
|
| 175 |
+
|
| 176 |
+
### Failure Modes [keep the accordion, it's useful]
|
| 177 |
+
|
| 178 |
+
### Implications for AI-Assisted Science
|
| 179 |
+
- The caution-recklessness trade-off mirrors real scientific decision-making
|
| 180 |
+
- An overconfident AI assistant could lead researchers astray
|
| 181 |
+
- An overcautious one wastes resources on unnecessary verification
|
| 182 |
+
```
|
| 183 |
+
|
| 184 |
+
### Move to Appendix
|
| 185 |
+
|
| 186 |
+
- Symmetric rules analysis (interesting but niche)
|
| 187 |
+
- Confirmation bias (preliminary, needs more work)
|
| 188 |
+
- Detailed qualitative examples (unless you expand them significantly)
|
| 189 |
+
|
| 190 |
+
---
|
| 191 |
+
|
| 192 |
+
## 5. Framing Suggestions
|
| 193 |
+
|
| 194 |
+
### 5.1 Lead with the Surprise
|
| 195 |
+
|
| 196 |
+
Current opening of Results is fine, but the key insight (metacognition matters) comes too late. Consider foreshadowing in the introduction:
|
| 197 |
+
|
| 198 |
+
> "We found something surprising: the model with the highest success rate doesn't have the highest score. What matters isn't just finding the answer—it's knowing when you've found it."
|
| 199 |
+
|
| 200 |
+
### 5.2 The "Scientific Personality" Frame
|
| 201 |
+
|
| 202 |
+
This is potentially memorable and shareable. Models as:
|
| 203 |
+
- **The Perfectionist** (GPT 5.2 High): Always wants more evidence
|
| 204 |
+
- **The Pragmatist** (Claude Opus 4.5): Good enough evidence is enough
|
| 205 |
+
- **The Gambler** (Claude Haiku 4.5): Guesses based on vibes
|
| 206 |
+
|
| 207 |
+
This framing:
|
| 208 |
+
- Makes the article more accessible to non-specialists
|
| 209 |
+
- Creates natural anchors for discussion
|
| 210 |
+
- Is scientifically defensible (behavioral clustering is real)
|
| 211 |
+
|
| 212 |
+
### 5.3 The Decision Theory Angle
|
| 213 |
+
|
| 214 |
+
You mention the optimal guessing threshold (0.67 confidence) briefly. This could be expanded:
|
| 215 |
+
|
| 216 |
+
> "Given perfect calibration, the optimal strategy is to guess whenever confidence exceeds 67%. But no model is well-calibrated. GPT 5.2 High effectively uses a threshold of ~95%; Claude Haiku 4.5 seems to use ~50%."
|
| 217 |
+
|
| 218 |
+
This quantifies the "personalities" and connects to calibration.
|
| 219 |
+
|
| 220 |
+
---
|
| 221 |
+
|
| 222 |
+
## 6. Minor Issues
|
| 223 |
+
|
| 224 |
+
### 6.1 Typos/Grammar
|
| 225 |
+
|
| 226 |
+
- results.mdx:38: "overconfident : for instance" → extra space before colon
|
| 227 |
+
- results.mdx:39: "GPT 5.2 is the best calibrated" → should be "GPT 5.2 High"
|
| 228 |
+
- results.mdx:51: "closed to Claude Opus 4.5" → "close to"
|
| 229 |
+
- results.mdx:103: "constrats" → "contrasts"
|
| 230 |
+
- analysis.mdx:60: "GPT OSS 120B also performs respectably at 12.0" → check number
|
| 231 |
+
|
| 232 |
+
### 6.2 Caption Numbering
|
| 233 |
+
|
| 234 |
+
Figure 7 appears twice (score-stack and reckless-guessing). Fix numbering.
|
| 235 |
+
|
| 236 |
+
### 6.3 Model Names Consistency
|
| 237 |
+
|
| 238 |
+
Inconsistent capitalization and naming:
|
| 239 |
+
- "Claude Opus 4.5" vs "Claude 4.5 Opus"
|
| 240 |
+
- "GPT 5.2 High" vs "Gpt 5.2 High" (in data files)
|
| 241 |
+
- "DeepSeek R1" vs "Deepseek R1"
|
| 242 |
+
|
| 243 |
+
---
|
| 244 |
+
|
| 245 |
+
## 7. Ideas for Additional Content
|
| 246 |
+
|
| 247 |
+
### 7.1 Interactive "Play a Round" Demo
|
| 248 |
+
|
| 249 |
+
Let readers play one round against a rule to experience the task. Even a simple version would be compelling. (This could be a stretch goal.)
|
| 250 |
+
|
| 251 |
+
### 7.2 Model-Specific Breakdowns
|
| 252 |
+
|
| 253 |
+
You have per-model PNG files (`model_claude_opus_4_5.png`, etc.). Consider:
|
| 254 |
+
- Appendix section with one page per model
|
| 255 |
+
- Or: expandable accordion for each model's detailed stats
|
| 256 |
+
|
| 257 |
+
### 7.3 Token Efficiency Discussion
|
| 258 |
+
|
| 259 |
+
You show score vs tokens in Figure 1 but don't discuss it much. Gemini 3 Flash achieves decent results with 4x fewer tokens than Opus—is that worth highlighting for practitioners?
|
| 260 |
+
|
| 261 |
+
### 7.4 Prompt Sensitivity
|
| 262 |
+
|
| 263 |
+
You note this as a limitation but could briefly test: what if you told models to be more cautious? More aggressive? (Could be future work suggestion.)
|
| 264 |
+
|
| 265 |
+
---
|
| 266 |
+
|
| 267 |
+
## 8. Prioritized Action Items
|
| 268 |
+
|
| 269 |
+
### Must Fix
|
| 270 |
+
1. Audit all numbers against latest data files
|
| 271 |
+
2. Fix duplicate Figure 7 numbering
|
| 272 |
+
3. Fix typos listed above
|
| 273 |
+
|
| 274 |
+
### Should Do
|
| 275 |
+
4. Reorganize Results section (Option A or B above)
|
| 276 |
+
5. Rename "Deeper Analysis" to "Discussion" and restructure
|
| 277 |
+
6. Add foreshadowing of key insight in introduction
|
| 278 |
+
|
| 279 |
+
### Nice to Have
|
| 280 |
+
7. Add example turn figure in benchmark.mdx
|
| 281 |
+
8. Expand "scientific personalities" framing
|
| 282 |
+
9. Human baseline (even informal)
|
| 283 |
+
10. Per-model detail pages in appendix
|
| 284 |
+
|
| 285 |
+
---
|
| 286 |
+
|
| 287 |
+
## 9. Summary
|
| 288 |
+
|
| 289 |
+
The benchmark and data are solid. The article's main weakness is structural: it has too many charts telling pieces of the same story without a clear narrative spine. The fix is to reorganize around **the key insight** (metacognition matters more than raw rule-finding ability) and **the key visual** (the caution-recklessness scatter plot).
|
| 290 |
+
|
| 291 |
+
Your target message—"Models differ dramatically because metacognition matters, and this is an opportunity for improvement"—is supported by the data but not yet prominently surfaced by the article structure.
|
app/src/content/assets/data/basic_metrics.csv
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:646b5eda63192bed7d4c3372c684b263db844ad6599e2cff7cd34b945e0a03da
|
| 3 |
+
size 2743
|
app/src/content/assets/data/by_rule.json
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bedd8081e1e412f0d2453c0f6fe78153fed8433520b9e1b729fc7b11dd5b02a8
|
| 3 |
+
size 30709
|
app/src/content/assets/data/by_rule.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
app/src/content/assets/data/complexity_analysis.json
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a281c2834fce731ee67126dc08e307268f411c4b7ec24006d36edccd303a6e6d
|
| 3 |
+
size 2273
|
app/src/content/assets/data/complexity_analysis.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
app/src/content/assets/data/model_claude_haiku_4_5.png
ADDED
|
Git LFS Details
|
app/src/content/assets/data/model_claude_opus_4_5.png
ADDED
|
Git LFS Details
|
app/src/content/assets/data/model_deepseek_r1.png
ADDED
|
Git LFS Details
|
app/src/content/assets/data/model_gemini_3_flash_preview_low.png
ADDED
|
Git LFS Details
|
app/src/content/assets/data/model_gpt_5_2_high.png
ADDED
|
Git LFS Details
|
app/src/content/assets/data/model_gpt_5_mini_medium.png
ADDED
|
Git LFS Details
|
app/src/content/assets/data/model_gpt_oss_120b.png
ADDED
|
Git LFS Details
|
app/src/content/assets/data/model_gpt_oss_20b.png
ADDED
|
Git LFS Details
|
app/src/content/assets/data/model_grok_4_1_fast_reasoning.png
ADDED
|
Git LFS Details
|
app/src/content/assets/data/model_kimi_k2.png
ADDED
|
Git LFS Details
|
app/src/content/assets/data/overall_performance.json
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:67f55d87526715789a9b2c902de6acc78f69dc5fd13300eb97e511668bca8003
|
| 3 |
+
size 2303
|
app/src/content/assets/data/overall_performance.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
app/src/content/assets/data/reckless_guessing.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a708723564f2779c2600346e347e2cff985a247bc950707d7f5c58137e05395b
|
| 3 |
+
size 19220
|
app/src/content/assets/data/reckless_guessing.png
ADDED
|
Git LFS Details
|
app/src/content/assets/data/score_stack.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d64dd73c3b7173b627be30fab1720d57fde169a419d6038a9dec3129a2c93a60
|
| 3 |
+
size 3723
|
app/src/content/assets/data/score_stack.png
ADDED
|
Git LFS Details
|
app/src/content/assets/data/score_vs_failed_guesses.json
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:581795032120f5075ef4f805472d19deebe0602aa6737e07bc62a35062f97758
|
| 3 |
+
size 2215
|
app/src/content/assets/data/score_vs_failed_guesses.png
CHANGED
|
Git LFS Details
|
|
Git LFS Details
|
app/src/content/assets/data/summary.txt
CHANGED
|
@@ -25,17 +25,17 @@ Loaded colors for 17 models
|
|
| 25 |
BASIC MODEL COMPARISON
|
| 26 |
============================================================
|
| 27 |
|
| 28 |
-
model rounds_played total_score avg_score total_turns total_output_tokens total_wall_clock avg_failed_guesses success_rate avg_output_tokens_per_turn wall_clock_per_turn intra_rule_variance inter_rule_variance variance_ratio
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
Gpt Oss 120B 78
|
| 36 |
-
Deepseek R1 78
|
| 37 |
-
Gpt Oss 20B 78
|
| 38 |
-
Claude Haiku 4.5 78
|
| 39 |
|
| 40 |
Saved: results/260121_78_rounds/basic_metrics.csv
|
| 41 |
Saved: results/260121_78_rounds/overall_performance.png
|
|
@@ -46,6 +46,26 @@ Saved: results/260121_78_rounds/calibration_curves.png
|
|
| 46 |
Saved: results/260121_78_rounds/calibration_curves.json
|
| 47 |
Saved: results/260121_78_rounds/confidence_distribution.png
|
| 48 |
Saved: results/260121_78_rounds/confidence_distribution.json
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
|
| 50 |
============================================================
|
| 51 |
BY-RULE ANALYSIS
|
|
@@ -53,32 +73,32 @@ BY-RULE ANALYSIS
|
|
| 53 |
|
| 54 |
Score by rule (sorted by avg_score):
|
| 55 |
rule_description count avg_score std_score success_rate
|
| 56 |
-
Only red cards (hearts or diamonds). 30
|
| 57 |
-
Only cards of the suit spades. 30
|
| 58 |
-
Cards must alternate between red and black colors. Any card may start the line. 30
|
| 59 |
-
Only cards with an even rank (2,4,6,8,10,12). 30
|
| 60 |
-
The card must be of a different suit than the card just before it. Any card may start the line. 30 21.
|
| 61 |
-
Card rank must have opposite odd/even parity to the previous card's rank. Any card may start the line. 30 20.
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
Suits must repeat in the cyclic order hearts → spades → clubs → diamonds → hearts... Any card may start the line. 30
|
| 69 |
-
Only cards between 1 and 7 inclusive. 30 13.
|
| 70 |
-
Only black face cards. 30
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
Each card must share at least one property with the previous card: same color, or same parity. Any card may start the line. 30
|
| 75 |
-
|
| 76 |
-
Suits must appear in pairs: card 1 and 2 same suit, cards 3 and 4 same suit (different from 1 and 2), cards 5 and 6 same suit (different from 3 and 4), etc. 30
|
| 77 |
-
|
| 78 |
-
Face cards (11-13) must be red; number cards (1-10) must be black. 30
|
| 79 |
-
Hearts and spades form Group A; clubs and diamonds form Group B. Alternate between groups. Any card may start the line. 30
|
| 80 |
-
If the previous card was red, rank must increase or be equal; if black, rank must decrease or be equal. Starting card must be between 5 and 9 inclusive. 30
|
| 81 |
-
|
| 82 |
|
| 83 |
Saved: results/260121_78_rounds/by_rule.png
|
| 84 |
Saved: results/260121_78_rounds/by_rule.json
|
|
@@ -112,22 +132,42 @@ Saved: results/260121_78_rounds/caution_vs_failed_guesses.png
|
|
| 112 |
Saved: results/260121_78_rounds/caution_vs_failed_guesses.json
|
| 113 |
|
| 114 |
============================================================
|
| 115 |
-
|
| 116 |
============================================================
|
| 117 |
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 131 |
|
| 132 |
============================================================
|
| 133 |
PER-MODEL REPORTS
|
|
|
|
| 25 |
BASIC MODEL COMPARISON
|
| 26 |
============================================================
|
| 27 |
|
| 28 |
+
model rounds_played total_score avg_score total_floored_score avg_floored_score total_turns total_output_tokens total_wall_clock avg_failed_guesses success_rate total_no_stakes_score avg_no_stakes_score avg_output_tokens_per_turn wall_clock_per_turn intra_rule_variance inter_rule_variance variance_ratio
|
| 29 |
+
Gpt 5.2 High 78 1158 14.846154 1174 15.051282 1205 3341037 73525.83 0.333333 0.961538 1505.0 19.294872 2772.644813 61.017286 25.858974 43.513162 0.594279
|
| 30 |
+
Claude Opus 4.5 78 1128 14.461538 1324 16.974359 852 4333716 86367.64 2.769231 0.923077 1598.0 20.487179 5086.521127 101.370469 87.525641 180.000684 0.486252
|
| 31 |
+
Gpt 5 Mini Medium 78 942 12.076923 1052 13.487179 1261 3618399 58345.97 1.256410 0.756410 1325.0 16.987179 2869.467883 46.269603 58.166667 115.878291 0.501963
|
| 32 |
+
Gemini 3 Flash Preview Low 78 817 10.474359 1024 13.128205 1315 1581524 12702.02 1.717949 0.769231 1226.0 15.717949 1202.679848 9.659331 61.128205 154.810427 0.394858
|
| 33 |
+
Kimi K2 78 804 10.307692 1262 16.179487 975 12281540 101346.76 4.025641 0.858974 1481.0 18.987179 12596.451282 103.945395 182.564103 343.003761 0.532251
|
| 34 |
+
Grok 4 1 Fast Reasoning 78 737 9.448718 1182 15.153846 998 8178655 120364.22 4.320513 0.884615 1441.0 18.474359 8195.045090 120.605431 109.256410 357.652821 0.305482
|
| 35 |
+
Gpt Oss 120B 78 580 7.435897 1004 12.871795 1243 3190828 24633.15 3.692308 0.756410 1279.0 16.397436 2567.037812 19.817498 186.794872 225.517949 0.828293
|
| 36 |
+
Deepseek R1 78 511 6.551282 1036 13.282051 1104 9229131 165334.16 5.064103 0.833333 1331.0 17.064103 8359.720109 149.759203 152.269231 353.910598 0.430248
|
| 37 |
+
Gpt Oss 20B 78 131 1.679487 927 11.884615 1297 7009392 62397.50 6.205128 0.717949 1206.0 15.461538 5404.311488 48.109098 230.115385 421.666496 0.545728
|
| 38 |
+
Claude Haiku 4.5 78 -37 -0.474359 894 11.461538 1254 6973411 57734.39 7.551282 0.705128 1198.0 15.358974 5560.933812 46.040183 244.730769 504.499316 0.485096
|
| 39 |
|
| 40 |
Saved: results/260121_78_rounds/basic_metrics.csv
|
| 41 |
Saved: results/260121_78_rounds/overall_performance.png
|
|
|
|
| 46 |
Saved: results/260121_78_rounds/calibration_curves.json
|
| 47 |
Saved: results/260121_78_rounds/confidence_distribution.png
|
| 48 |
Saved: results/260121_78_rounds/confidence_distribution.json
|
| 49 |
+
Saved: results/260121_78_rounds/score_stack.png
|
| 50 |
+
Saved: results/260121_78_rounds/score_stack.json
|
| 51 |
+
|
| 52 |
+
============================================================
|
| 53 |
+
COMPLEXITY ANALYSIS
|
| 54 |
+
============================================================
|
| 55 |
+
|
| 56 |
+
Optimal K for aggregated complexity: 0.42
|
| 57 |
+
Formula: complexity = cyclomatic + 0.42 * node_count
|
| 58 |
+
Correlation with success_rate: -0.612
|
| 59 |
+
|
| 60 |
+
Stats by complexity quartile:
|
| 61 |
+
complexity_bin count avg_score success_rate
|
| 62 |
+
Q1 240 18.745833 0.966667
|
| 63 |
+
Q2 150 11.246667 0.893333
|
| 64 |
+
Q3 180 11.138889 0.866667
|
| 65 |
+
Q4 210 -6.761905 0.547619
|
| 66 |
+
|
| 67 |
+
Saved: results/260121_78_rounds/complexity_analysis.png
|
| 68 |
+
Saved: results/260121_78_rounds/complexity_analysis.json
|
| 69 |
|
| 70 |
============================================================
|
| 71 |
BY-RULE ANALYSIS
|
|
|
|
| 73 |
|
| 74 |
Score by rule (sorted by avg_score):
|
| 75 |
rule_description count avg_score std_score success_rate
|
| 76 |
+
Only red cards (hearts or diamonds). 30 25.633333 2.204749 1.000000
|
| 77 |
+
Only cards of the suit spades. 30 25.200000 2.023994 1.000000
|
| 78 |
+
Cards must alternate between red and black colors. Any card may start the line. 30 25.166667 2.640315 1.000000
|
| 79 |
+
Only cards with an even rank (2,4,6,8,10,12). 30 24.300000 2.692903 1.000000
|
| 80 |
+
The card must be of a different suit than the card just before it. Any card may start the line. 30 21.666667 8.659590 0.966667
|
| 81 |
+
Card rank must have opposite odd/even parity to the previous card's rank. Any card may start the line. 30 20.666667 5.148373 1.000000
|
| 82 |
+
Only Aces (rank 1) . 30 20.233333 8.931476 0.966667
|
| 83 |
+
The card must be of a different suit than but same color as the card just before it. Any card may start the line. 30 19.866667 7.541761 1.000000
|
| 84 |
+
Only hearts, clubs, and diamonds allowed. Spades are forbidden. 30 19.533333 10.836507 0.966667
|
| 85 |
+
Only spades and diamonds. 30 19.066667 4.487018 1.000000
|
| 86 |
+
Only ranks that are prime numbers (2,3,5,7,11,13). 30 18.633333 12.527166 0.966667
|
| 87 |
+
Only face cards (11,12,13). 30 17.033333 16.044084 0.900000
|
| 88 |
+
Suits must repeat in the cyclic order hearts → spades → clubs → diamonds → hearts... Any card may start the line. 30 15.100000 12.234350 1.000000
|
| 89 |
+
Only cards between 1 and 7 inclusive. 30 13.366667 10.148835 0.966667
|
| 90 |
+
Only black face cards. 30 7.700000 16.316165 0.900000
|
| 91 |
+
Only red cards whose rank is <=7. 30 4.866667 11.227225 1.000000
|
| 92 |
+
Only cards between 5 and 9 inclusive. 30 4.666667 14.406257 0.933333
|
| 93 |
+
Alternate face and number cards. Any card may start the line. 30 0.366667 20.553519 0.733333
|
| 94 |
+
Each card must share at least one property with the previous card: same color, or same parity. Any card may start the line. 30 -1.066667 20.915154 0.666667
|
| 95 |
+
Each card must have a rank greater or equal to the previous card. Only Ace can start the line. 30 -3.433333 22.931206 0.600000
|
| 96 |
+
Suits must appear in pairs: card 1 and 2 same suit, cards 3 and 4 same suit (different from 1 and 2), cards 5 and 6 same suit (different from 3 and 4), etc. 30 -5.200000 18.917972 0.766667
|
| 97 |
+
Face cards imposes the suit: if a face card is played, the next card must match its suit. Otherwise, the next card must be a different suit than it. 30 -10.466667 13.050917 0.533333
|
| 98 |
+
Face cards (11-13) must be red; number cards (1-10) must be black. 30 -11.500000 17.814659 0.500000
|
| 99 |
+
Hearts and spades form Group A; clubs and diamonds form Group B. Alternate between groups. Any card may start the line. 30 -12.066667 16.772172 0.400000
|
| 100 |
+
If the previous card was red, rank must increase or be equal; if black, rank must decrease or be equal. Starting card must be between 5 and 9 inclusive. 30 -15.633333 15.354396 0.333333
|
| 101 |
+
Rank repeats in pairs: ranks must come in doubles: (x, x), then (y, y) with y different from x, then (z, z) with z different from y, etc. 30 -18.000000 16.103116 0.133333
|
| 102 |
|
| 103 |
Saved: results/260121_78_rounds/by_rule.png
|
| 104 |
Saved: results/260121_78_rounds/by_rule.json
|
|
|
|
| 132 |
Saved: results/260121_78_rounds/caution_vs_failed_guesses.json
|
| 133 |
|
| 134 |
============================================================
|
| 135 |
+
RECKLESS GUESSING ANALYSIS
|
| 136 |
============================================================
|
| 137 |
|
| 138 |
+
Double-Down Rate: After a wrong guess, % of next turns with another guess
|
| 139 |
+
(Only counts official guesses, not shadow/tentative guesses)
|
| 140 |
+
|
| 141 |
+
Model Wrong Guesses Next Turn Guesses Double-Down %
|
| 142 |
+
Kimi K2 314 207 65.9
|
| 143 |
+
Claude Haiku 4.5 589 362 61.5
|
| 144 |
+
Grok 4 1 Fast Reasoning 337 203 60.2
|
| 145 |
+
Gpt Oss 20B 484 290 59.9
|
| 146 |
+
Deepseek R1 395 229 58.0
|
| 147 |
+
Claude Opus 4.5 216 91 42.1
|
| 148 |
+
Gpt Oss 120B 288 108 37.5
|
| 149 |
+
Gemini 3 Flash Preview Low 134 41 30.6
|
| 150 |
+
Gpt 5 Mini Medium 98 9 9.2
|
| 151 |
+
Gpt 5.2 High 26 1 3.8
|
| 152 |
+
|
| 153 |
+
Wrong Guess Streak Statistics:
|
| 154 |
+
Model Streaks Mean Length Max Length Total Wrong
|
| 155 |
+
Kimi K2 120 2.62 14 314
|
| 156 |
+
Claude Haiku 4.5 244 2.41 16 589
|
| 157 |
+
Grok 4 1 Fast Reasoning 149 2.26 12 337
|
| 158 |
+
Gpt Oss 20B 207 2.34 13 484
|
| 159 |
+
Deepseek R1 180 2.19 9 395
|
| 160 |
+
Claude Opus 4.5 139 1.55 5 216
|
| 161 |
+
Gpt Oss 120B 184 1.57 8 288
|
| 162 |
+
Gemini 3 Flash Preview Low 97 1.38 4 134
|
| 163 |
+
Gpt 5 Mini Medium 91 1.08 3 98
|
| 164 |
+
Gpt 5.2 High 25 1.04 2 26
|
| 165 |
+
|
| 166 |
+
Longest streak: 16 consecutive wrong guesses
|
| 167 |
+
- Claude Haiku 4.5 in round 77
|
| 168 |
+
|
| 169 |
+
Saved: results/260121_78_rounds/reckless_guessing.png
|
| 170 |
+
Saved: results/260121_78_rounds/reckless_guessing.json
|
| 171 |
|
| 172 |
============================================================
|
| 173 |
PER-MODEL REPORTS
|
app/src/content/chapters/eleusis/benchmark.mdx
CHANGED
|
@@ -26,9 +26,9 @@ On each turn, the player selects a card from their hand to play. If the card sat
|
|
| 26 |
|
| 27 |
When correctly guessing the rule, the player scores as many points as the number of remaining turns, and each wrong guess deducts a penalty of 2 points:
|
| 28 |
|
| 29 |
-
$$\text{score} = (30 - \text{turns\
|
| 30 |
|
| 31 |
-
A player who correctly identifies the rule on turn
|
| 32 |
|
| 33 |
### Rule Library
|
| 34 |
|
|
|
|
| 26 |
|
| 27 |
When correctly guessing the rule, the player scores as many points as the number of remaining turns, and each wrong guess deducts a penalty of 2 points:
|
| 28 |
|
| 29 |
+
$$\text{score} = (30 - \text{turns\_elapsed} + 1) - 2 \times \text{num_wrong\_guesses}$$
|
| 30 |
|
| 31 |
+
A player who correctly identifies the rule on turn 13 with no wrong guesses scores 18 points; one who made 3 wrong guesses along the way scores only 12. Failing to identify the rule scores 0 but penalties for wrong guesses still apply, leading to possibly a negative score. This creates an interesting tension: guessing early yields more points if correct, but wrong guesses are costly. The optimal strategy requires accurately assessing one's own confidence, exactly the calibration we want to measure.
|
| 32 |
|
| 33 |
### Rule Library
|
| 34 |
|
app/src/content/chapters/eleusis/results.mdx
CHANGED
|
@@ -3,13 +3,6 @@ import Wide from "../../../components/Wide.astro";
|
|
| 3 |
import Note from "../../../components/Note.astro";
|
| 4 |
import Sidenote from "../../../components/Sidenote.astro";
|
| 5 |
import HtmlEmbed from "../../../components/HtmlEmbed.astro";
|
| 6 |
-
import calibrationCurves from "../../assets/data/calibration_curves.png";
|
| 7 |
-
import confidenceDistribution from "../../assets/data/confidence_distribution.png";
|
| 8 |
-
import scoreVsFailedGuesses from "../../assets/data/score_vs_failed_guesses.png";
|
| 9 |
-
import cautionVsFailedGuesses from "../../assets/data/caution_vs_failed_guesses.png";
|
| 10 |
-
import excessCaution from "../../assets/data/excess_caution.png";
|
| 11 |
-
import byRule from "../../assets/data/by_rule.png";
|
| 12 |
-
import complexityAnalysis from "../../assets/data/complexity_analysis.png";
|
| 13 |
|
| 14 |
## Results
|
| 15 |
|
|
@@ -34,12 +27,10 @@ Deepseek R1, an open-weight model specialized for reasoning tasks, lags behind a
|
|
| 34 |
|
| 35 |
Models are asked to output their confidence level, with clear instructions on what it means (7 = 70% probability of being correct, etc.). Even when they don't guess, they report their tentative rule. When confidence ≥5, we test whether they would have guessed correctly, even if they didn't formally attempted to guess. This allows us to evaluate calibration: does reported confidence match actual accuracy?
|
| 36 |
|
| 37 |
-
<
|
| 38 |
-
src=
|
| 39 |
-
|
| 40 |
-
caption="<strong>Figure 2:</strong> Calibration curves for each model. A perfectly calibrated model would follow the diagonal. Points below the line indicate overconfidence : they correspond to confidence levels where actual success rates are lower than reported."
|
| 41 |
id="fig-calibration"
|
| 42 |
-
zoomable
|
| 43 |
/>
|
| 44 |
|
| 45 |
The calibration analysis reveals several patterns:
|
|
@@ -51,29 +42,27 @@ The calibration analysis reveals several patterns:
|
|
| 51 |
|
| 52 |
It is also interesting to examine the distribution of confidence levels when models choose to guess.
|
| 53 |
|
| 54 |
-
<
|
| 55 |
-
src=
|
| 56 |
-
|
| 57 |
-
caption="<strong>Figure 3:</strong> Distribution of confidence levels. Left: when models choose to formally guess. Right: when models choose not to guess. Well-calibrated models should show clear separation between these distributions."
|
| 58 |
id="fig-confidence"
|
| 59 |
-
zoomable
|
| 60 |
/>
|
| 61 |
|
| 62 |
We can see that some models like Grok 4.1 or Gemini 3 will essentially only guess when very confident (9 or 10). Other like GPT 5.2 High or Kimi K2 might also guess at confidence levels 8. Surprisingly, the best performing model Claude Opus 4.5 has a more spread out guessing behavior, often guessing at confidence levels 7 or even 6. Claude Haiku 4.5 has the most reckless guessing behavior, mostly guessing at confidence levels 6 to 8.
|
| 63 |
|
| 64 |
Being able to separate confidence levels when guessing vs not guessing is an important metacognitive skill. Models that guess only when very confident are less likely to make wrong guesses, but may miss opportunities to commit early and gain points. Models that guess at lower confidence levels risk more wrong guesses, but can capitalize on early correct guesses. This trade-off is explored next.
|
| 65 |
|
|
|
|
|
|
|
| 66 |
|
| 67 |
### Guessing Strategy
|
| 68 |
|
| 69 |
The scoring system creates a strategic tension: guess early for more points, but wrong guesses are costly. How do models navigate this tradeoff? We can analyze their guessing efficiency by plotting average score vs average number of failed guesses per round.
|
| 70 |
|
| 71 |
-
<
|
| 72 |
-
src=
|
| 73 |
-
alt="2D scatter plot showing average score vs average number of failed guesses per round for each model"
|
| 74 |
caption="<strong>Figure 4:</strong> Score vs. failed guesses per round. Models in the upper-left are efficient (high scores, few wrong guesses). Models that guess recklessly appear on the right with low scores."
|
| 75 |
id="fig-guessing"
|
| 76 |
-
zoomable
|
| 77 |
/>
|
| 78 |
|
| 79 |
<Sidenote>
|
|
@@ -84,12 +73,10 @@ The scoring system creates a strategic tension: guess early for more points, but
|
|
| 84 |
|
| 85 |
Failed guesses tell only half the story. A model might avoid wrong guesses by being *too* cautious—waiting many turns after it already has the correct answer. To measure this, we tracked "early correct turns": how many consecutive turns a model's tentative rule was correct before it finally chose to guess.
|
| 86 |
|
| 87 |
-
<
|
| 88 |
-
src=
|
| 89 |
-
|
| 90 |
-
caption="<strong>Figure 5:</strong> Distribution of early correct turns (waiting with the correct answer). Higher values indicate excessive caution—the model knew the answer but hesitated to commit. GPT 5.2 High stands out as extremely cautious, with a median of 3 turns of unnecessary delay."
|
| 91 |
id="fig-excess-caution"
|
| 92 |
-
zoomable
|
| 93 |
/>
|
| 94 |
|
| 95 |
The results reveal striking differences in guessing personalities:
|
|
@@ -98,12 +85,10 @@ The results reveal striking differences in guessing personalities:
|
|
| 98 |
- **Claude Opus 4.5** shows excellent timing—only 0.9 early correct turns on average, meaning it commits almost immediately after finding the answer.
|
| 99 |
- **Claude Haiku 4.5** and **DeepSeek R1** are the least cautious (0.5 early turns), but this comes at a cost: they also have the highest failed guess rates.
|
| 100 |
|
| 101 |
-
<
|
| 102 |
-
src=
|
| 103 |
-
alt="Scatter plot showing caution (early correct turns) vs recklessness (failed guesses) for each model"
|
| 104 |
caption="<strong>Figure 6:</strong> The caution-recklessness trade-off. Models in the upper-left are cautious (delay correct guesses); models in the lower-right are reckless (many failed guesses). The ideal position is lower-left: quick to commit when right, rarely wrong."
|
| 105 |
id="fig-caution-reckless"
|
| 106 |
-
zoomable
|
| 107 |
/>
|
| 108 |
|
| 109 |
<Sidenote>
|
|
@@ -118,8 +103,45 @@ This visualization reveals distinct behavioral patterns:
|
|
| 118 |
|
| 119 |
* Deepseek R1 and Claude Haiku 4.5 cluster in the lower-right, being both reckless and not particularly cautious, leading to poor performance.
|
| 120 |
|
|
|
|
| 121 |
The data suggests that knowing when you know is just as important as knowing the answer. Claude Opus 4.5's strong performance comes not just from finding correct rules, but from accurate metacognition, recognizing when it has gathered enough evidence to commit, even at the risk of occasional wrong guesses.
|
| 122 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 123 |
### Performance by Rule
|
| 124 |
|
| 125 |
Not all rules are created equal. Some rules are discovered quickly by all models (e.g. "All cards must be red") while others prove consistently challenging (e.g. "increase rank after a red card, decrease after a black").
|
|
@@ -128,26 +150,21 @@ It is not easy to quantify rule complexity, as it depends on multiple factors: t
|
|
| 128 |
|
| 129 |
The following figure breaks down performance by rule across all models and runs.
|
| 130 |
|
| 131 |
-
<
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
alt="Performance breakdown by rule showing score distribution for each rule across all models"
|
| 135 |
-
caption="<strong>Figure 7:</strong> Score distribution by rule. Each row is a different rule, with individual run scores shown as points. Some rules are consistently easy for all models, while others show wide variance and lower scores, indicating higher complexity. For each rule, we computed a complexity score (see below) to analyze its impact on performance."
|
| 136 |
id="fig-by-rule"
|
| 137 |
-
|
| 138 |
/>
|
| 139 |
-
</Wide>
|
| 140 |
|
| 141 |
We can see that the most complex rules are devastating for the reckless models like Claude Haiku 4.5 and DeepSeek R1, which often negative scores on these rules due to multiple wrong guesses. Even the best models struggle on the hardest rules, but their superior metacognition allows them to avoid catastrophic failures.
|
| 142 |
|
| 143 |
The following plot breaks down the relative score of each model (as measured by score on the rule divided by average score on all rules) against the complexity metrics of each rule.
|
| 144 |
|
| 145 |
-
<
|
| 146 |
-
src=
|
| 147 |
-
|
| 148 |
-
caption="<strong>Figure 8:</strong> Relationship between rule complexity and performance. Multiple complexity factors contribute: acceptance rate, structural complexity, and semantic difficulty."
|
| 149 |
id="fig-complexity"
|
| 150 |
-
zoomable
|
| 151 |
/>
|
| 152 |
|
| 153 |
<Note variant="info">
|
|
|
|
| 3 |
import Note from "../../../components/Note.astro";
|
| 4 |
import Sidenote from "../../../components/Sidenote.astro";
|
| 5 |
import HtmlEmbed from "../../../components/HtmlEmbed.astro";
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
## Results
|
| 8 |
|
|
|
|
| 27 |
|
| 28 |
Models are asked to output their confidence level, with clear instructions on what it means (7 = 70% probability of being correct, etc.). Even when they don't guess, they report their tentative rule. When confidence ≥5, we test whether they would have guessed correctly, even if they didn't formally attempted to guess. This allows us to evaluate calibration: does reported confidence match actual accuracy?
|
| 29 |
|
| 30 |
+
<HtmlEmbed
|
| 31 |
+
src="calibration-curves.html"
|
| 32 |
+
caption="<strong>Figure 2:</strong> Calibration curves for each model. A perfectly calibrated model would follow the diagonal. Points below the line indicate overconfidence: they correspond to confidence levels where actual success rates are lower than reported. Click legend items to show/hide models."
|
|
|
|
| 33 |
id="fig-calibration"
|
|
|
|
| 34 |
/>
|
| 35 |
|
| 36 |
The calibration analysis reveals several patterns:
|
|
|
|
| 42 |
|
| 43 |
It is also interesting to examine the distribution of confidence levels when models choose to guess.
|
| 44 |
|
| 45 |
+
<HtmlEmbed
|
| 46 |
+
src="confidence-distribution.html"
|
| 47 |
+
caption="<strong>Figure 3:</strong> Distribution of confidence levels when models choose to formally guess. Each bar shows the proportion of guesses made at that confidence level. Click legend items to show/hide models."
|
|
|
|
| 48 |
id="fig-confidence"
|
|
|
|
| 49 |
/>
|
| 50 |
|
| 51 |
We can see that some models like Grok 4.1 or Gemini 3 will essentially only guess when very confident (9 or 10). Other like GPT 5.2 High or Kimi K2 might also guess at confidence levels 8. Surprisingly, the best performing model Claude Opus 4.5 has a more spread out guessing behavior, often guessing at confidence levels 7 or even 6. Claude Haiku 4.5 has the most reckless guessing behavior, mostly guessing at confidence levels 6 to 8.
|
| 52 |
|
| 53 |
Being able to separate confidence levels when guessing vs not guessing is an important metacognitive skill. Models that guess only when very confident are less likely to make wrong guesses, but may miss opportunities to commit early and gain points. Models that guess at lower confidence levels risk more wrong guesses, but can capitalize on early correct guesses. This trade-off is explored next.
|
| 54 |
|
| 55 |
+
Note that in principle there is a decision-theoretic optimal confidence threshold for guessing, which depends on the scoring system. Given the scoring that rewards 1 point per turn left, with 2 points penalty for a wrong guess, the optimal threshold is 0.67 (i.e., guess when you believe your tentative rule has at least a 67% chance of being correct). Of course this assumes perfect calibration, which none of the models achieve.
|
| 56 |
+
|
| 57 |
|
| 58 |
### Guessing Strategy
|
| 59 |
|
| 60 |
The scoring system creates a strategic tension: guess early for more points, but wrong guesses are costly. How do models navigate this tradeoff? We can analyze their guessing efficiency by plotting average score vs average number of failed guesses per round.
|
| 61 |
|
| 62 |
+
<HtmlEmbed
|
| 63 |
+
src="score-vs-failed-guesses.html"
|
|
|
|
| 64 |
caption="<strong>Figure 4:</strong> Score vs. failed guesses per round. Models in the upper-left are efficient (high scores, few wrong guesses). Models that guess recklessly appear on the right with low scores."
|
| 65 |
id="fig-guessing"
|
|
|
|
| 66 |
/>
|
| 67 |
|
| 68 |
<Sidenote>
|
|
|
|
| 73 |
|
| 74 |
Failed guesses tell only half the story. A model might avoid wrong guesses by being *too* cautious—waiting many turns after it already has the correct answer. To measure this, we tracked "early correct turns": how many consecutive turns a model's tentative rule was correct before it finally chose to guess.
|
| 75 |
|
| 76 |
+
<HtmlEmbed
|
| 77 |
+
src="excess-caution.html"
|
| 78 |
+
caption="<strong>Figure 5:</strong> Distribution of early correct turns (waiting with the correct answer). Higher values indicate excessive caution—the model knew the answer but hesitated to commit. GPT 5.2 High stands out as extremely cautious, with a mean of 3.6 turns of unnecessary delay."
|
|
|
|
| 79 |
id="fig-excess-caution"
|
|
|
|
| 80 |
/>
|
| 81 |
|
| 82 |
The results reveal striking differences in guessing personalities:
|
|
|
|
| 85 |
- **Claude Opus 4.5** shows excellent timing—only 0.9 early correct turns on average, meaning it commits almost immediately after finding the answer.
|
| 86 |
- **Claude Haiku 4.5** and **DeepSeek R1** are the least cautious (0.5 early turns), but this comes at a cost: they also have the highest failed guess rates.
|
| 87 |
|
| 88 |
+
<HtmlEmbed
|
| 89 |
+
src="caution-vs-failed-guesses.html"
|
|
|
|
| 90 |
caption="<strong>Figure 6:</strong> The caution-recklessness trade-off. Models in the upper-left are cautious (delay correct guesses); models in the lower-right are reckless (many failed guesses). The ideal position is lower-left: quick to commit when right, rarely wrong."
|
| 91 |
id="fig-caution-reckless"
|
|
|
|
| 92 |
/>
|
| 93 |
|
| 94 |
<Sidenote>
|
|
|
|
| 103 |
|
| 104 |
* Deepseek R1 and Claude Haiku 4.5 cluster in the lower-right, being both reckless and not particularly cautious, leading to poor performance.
|
| 105 |
|
| 106 |
+
|
| 107 |
The data suggests that knowing when you know is just as important as knowing the answer. Claude Opus 4.5's strong performance comes not just from finding correct rules, but from accurate metacognition, recognizing when it has gathered enough evidence to commit, even at the risk of occasional wrong guesses.
|
| 108 |
|
| 109 |
+
This analysis constrats two ways of losing points : by being too cautious (waiting too long to commit) vs by being too reckless (making too many wrong guesses). A way to visualize this is to explore alternative scoring systems, as we do next.
|
| 110 |
+
|
| 111 |
+
|
| 112 |
+
|
| 113 |
+
### Alternative Scoring Systems
|
| 114 |
+
|
| 115 |
+
The Eleusis scoring system includes harsh penalties: wrong guesses cost 2 points each, and rounds can end with negative scores. How much do these penalties affect rankings? To understand the impact of our scoring choices, we compare three scoring variants:
|
| 116 |
+
|
| 117 |
+
1. **Raw score**: The standard scoring (30 - turns - 2×wrong guesses)
|
| 118 |
+
2. **Floored score**: Same formula, but negative scores are counted as zero
|
| 119 |
+
3. **No-stakes score**: No penalty for wrong guesses, and tentative rules count as guesses
|
| 120 |
+
|
| 121 |
+
<HtmlEmbed
|
| 122 |
+
src="score-stack.html"
|
| 123 |
+
caption="<strong>Figure 7:</strong> Score breakdown under alternative scoring systems. Blue shows raw score (standard scoring). Orange shows flooring gain (what models gain if negative scores count as 0). Green shows no-stakes gain (additional gain from removing wrong-guess penalties). Models sorted by total no-stakes score."
|
| 124 |
+
id="fig-score-stack"
|
| 125 |
+
wide
|
| 126 |
+
/>
|
| 127 |
+
|
| 128 |
+
The flooring gain (orange) reveals which models frequently go negative. GPT 5.2 High gains almost nothing from flooring (0.2 points), indicating it rarely makes enough wrong guesses to go negative. In contrast, Claude Haiku 4.5 gains 11.9 points—nearly 12 points of damage averted per round on average—showing how its reckless guessing leads to catastrophic losses.
|
| 129 |
+
|
| 130 |
+
The no-stakes gain (green) shows what models would gain if we simply tested their tentative rule each turn. Interestingly, this gain is relatively consistent across models (2.5–4.2 points), suggesting that most models form correct hypotheses at similar rates, but differ dramatically in their ability to *recognize* when they have the right answer.
|
| 131 |
+
|
| 132 |
+
Under any scoring system, Claude Opus 4.5 and GPT 5.2 High remain the top performers. The ranking compression at no-stakes scores (15.4 to 20.5 vs raw -0.5 to 14.8) confirms that our scoring system appropriately rewards good metacognition—knowing when you know.
|
| 133 |
+
|
| 134 |
+
|
| 135 |
+
### Analysis of the reckless guessing behavior
|
| 136 |
+
|
| 137 |
+
Some models loose a lot of points due to reckless guessing. In the "no stakes" scoring system, Claude 4.5 Opus takes the lead, Kimi K2 and Grok 4.1 have similar performance to GPT 5.2 High.
|
| 138 |
+
|
| 139 |
+
<HtmlEmbed
|
| 140 |
+
src="reckless-guessing.html"
|
| 141 |
+
caption="<strong>Figure 7b:</strong> Double-down rate: how often a model guesses again immediately after a wrong guess. Higher values indicate more reckless behavior—the model keeps guessing despite recent failures."
|
| 142 |
+
id="fig-reckless-guessing"
|
| 143 |
+
/>
|
| 144 |
+
|
| 145 |
### Performance by Rule
|
| 146 |
|
| 147 |
Not all rules are created equal. Some rules are discovered quickly by all models (e.g. "All cards must be red") while others prove consistently challenging (e.g. "increase rank after a red card, decrease after a black").
|
|
|
|
| 150 |
|
| 151 |
The following figure breaks down performance by rule across all models and runs.
|
| 152 |
|
| 153 |
+
<HtmlEmbed
|
| 154 |
+
src="by-rule.html"
|
| 155 |
+
caption="<strong>Figure 8:</strong> Score distribution by rule. Each row is a different rule, with individual run scores shown as colored dots (one per model run). Hover over rule names for details. The left column shows average success rate. Click legend items to show/hide models."
|
|
|
|
|
|
|
| 156 |
id="fig-by-rule"
|
| 157 |
+
wide
|
| 158 |
/>
|
|
|
|
| 159 |
|
| 160 |
We can see that the most complex rules are devastating for the reckless models like Claude Haiku 4.5 and DeepSeek R1, which often negative scores on these rules due to multiple wrong guesses. Even the best models struggle on the hardest rules, but their superior metacognition allows them to avoid catastrophic failures.
|
| 161 |
|
| 162 |
The following plot breaks down the relative score of each model (as measured by score on the rule divided by average score on all rules) against the complexity metrics of each rule.
|
| 163 |
|
| 164 |
+
<HtmlEmbed
|
| 165 |
+
src="complexity-analysis.html"
|
| 166 |
+
caption="<strong>Figure 9:</strong> Relationship between rule complexity and model performance. The heatmap shows relative scores (value > 1 means above-average performance) for each model across complexity quartiles. Hover over cells for details."
|
|
|
|
| 167 |
id="fig-complexity"
|
|
|
|
| 168 |
/>
|
| 169 |
|
| 170 |
<Note variant="info">
|
app/src/content/embeds/by-rule.html
ADDED
|
@@ -0,0 +1,521 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div class="d3-by-rule"></div>
|
| 2 |
+
<style>
|
| 3 |
+
.d3-by-rule {
|
| 4 |
+
width: 100%;
|
| 5 |
+
margin: 10px 0;
|
| 6 |
+
position: relative;
|
| 7 |
+
font-family: system-ui, -apple-system, sans-serif;
|
| 8 |
+
}
|
| 9 |
+
|
| 10 |
+
.d3-by-rule svg {
|
| 11 |
+
display: block;
|
| 12 |
+
width: 100%;
|
| 13 |
+
height: auto;
|
| 14 |
+
}
|
| 15 |
+
|
| 16 |
+
.d3-by-rule .axes path,
|
| 17 |
+
.d3-by-rule .axes line {
|
| 18 |
+
stroke: var(--axis-color, var(--text-color));
|
| 19 |
+
}
|
| 20 |
+
|
| 21 |
+
.d3-by-rule .axes text {
|
| 22 |
+
fill: var(--tick-color, var(--muted-color));
|
| 23 |
+
font-size: 11px;
|
| 24 |
+
}
|
| 25 |
+
|
| 26 |
+
.d3-by-rule .grid line {
|
| 27 |
+
stroke: var(--grid-color, rgba(0,0,0,.08));
|
| 28 |
+
}
|
| 29 |
+
|
| 30 |
+
.d3-by-rule .axes text.axis-label {
|
| 31 |
+
font-size: 14px;
|
| 32 |
+
font-weight: 500;
|
| 33 |
+
fill: var(--text-color);
|
| 34 |
+
}
|
| 35 |
+
|
| 36 |
+
.d3-by-rule .x-axis text {
|
| 37 |
+
transform: translateY(4px);
|
| 38 |
+
}
|
| 39 |
+
|
| 40 |
+
.d3-by-rule .rule-label {
|
| 41 |
+
font-size: 10px;
|
| 42 |
+
fill: var(--text-color);
|
| 43 |
+
cursor: pointer;
|
| 44 |
+
}
|
| 45 |
+
|
| 46 |
+
.d3-by-rule .rule-label:hover {
|
| 47 |
+
text-decoration: underline;
|
| 48 |
+
}
|
| 49 |
+
|
| 50 |
+
.d3-by-rule .complexity-bar {
|
| 51 |
+
opacity: 0.85;
|
| 52 |
+
}
|
| 53 |
+
|
| 54 |
+
.d3-by-rule .complexity-text {
|
| 55 |
+
font-size: 9px;
|
| 56 |
+
font-weight: 600;
|
| 57 |
+
pointer-events: none;
|
| 58 |
+
}
|
| 59 |
+
|
| 60 |
+
.d3-by-rule .point {
|
| 61 |
+
opacity: 0.85;
|
| 62 |
+
transition: opacity 0.1s ease;
|
| 63 |
+
}
|
| 64 |
+
|
| 65 |
+
.d3-by-rule .point:hover {
|
| 66 |
+
opacity: 1;
|
| 67 |
+
}
|
| 68 |
+
|
| 69 |
+
.d3-by-rule .point.dimmed {
|
| 70 |
+
opacity: 0.15;
|
| 71 |
+
}
|
| 72 |
+
|
| 73 |
+
.d3-by-rule .legend-item {
|
| 74 |
+
cursor: pointer;
|
| 75 |
+
}
|
| 76 |
+
|
| 77 |
+
.d3-by-rule .legend-item.inactive .legend-dot {
|
| 78 |
+
opacity: 0.3;
|
| 79 |
+
}
|
| 80 |
+
|
| 81 |
+
.d3-by-rule .legend-item.inactive .legend-text {
|
| 82 |
+
opacity: 0.5;
|
| 83 |
+
text-decoration: line-through;
|
| 84 |
+
}
|
| 85 |
+
|
| 86 |
+
.d3-by-rule .legend-text {
|
| 87 |
+
font-size: 10px;
|
| 88 |
+
fill: var(--text-color);
|
| 89 |
+
}
|
| 90 |
+
|
| 91 |
+
.d3-by-rule .d3-tooltip {
|
| 92 |
+
position: absolute;
|
| 93 |
+
top: 0;
|
| 94 |
+
left: 0;
|
| 95 |
+
transform: translate(-9999px, -9999px);
|
| 96 |
+
pointer-events: none;
|
| 97 |
+
padding: 10px 12px;
|
| 98 |
+
border-radius: 8px;
|
| 99 |
+
font-size: 12px;
|
| 100 |
+
line-height: 1.5;
|
| 101 |
+
border: 1px solid var(--border-color);
|
| 102 |
+
background: var(--surface-bg);
|
| 103 |
+
color: var(--text-color);
|
| 104 |
+
box-shadow: 0 4px 24px rgba(0,0,0,.18);
|
| 105 |
+
opacity: 0;
|
| 106 |
+
transition: opacity 0.12s ease;
|
| 107 |
+
z-index: 10;
|
| 108 |
+
max-width: 320px;
|
| 109 |
+
}
|
| 110 |
+
|
| 111 |
+
.d3-by-rule .d3-tooltip .rule-name {
|
| 112 |
+
font-weight: 600;
|
| 113 |
+
margin-bottom: 6px;
|
| 114 |
+
}
|
| 115 |
+
|
| 116 |
+
.d3-by-rule .d3-tooltip .rule-desc {
|
| 117 |
+
margin-bottom: 8px;
|
| 118 |
+
color: var(--muted-color);
|
| 119 |
+
font-size: 11px;
|
| 120 |
+
}
|
| 121 |
+
|
| 122 |
+
.d3-by-rule .d3-tooltip .metric {
|
| 123 |
+
display: flex;
|
| 124 |
+
justify-content: space-between;
|
| 125 |
+
gap: 16px;
|
| 126 |
+
}
|
| 127 |
+
|
| 128 |
+
.d3-by-rule .d3-tooltip .metric-label {
|
| 129 |
+
color: var(--muted-color);
|
| 130 |
+
}
|
| 131 |
+
|
| 132 |
+
.d3-by-rule .d3-tooltip .metric-value {
|
| 133 |
+
font-weight: 500;
|
| 134 |
+
}
|
| 135 |
+
</style>
|
| 136 |
+
<script>
|
| 137 |
+
(() => {
|
| 138 |
+
const ensureD3 = (cb) => {
|
| 139 |
+
if (window.d3 && typeof window.d3.select === 'function') return cb();
|
| 140 |
+
let s = document.getElementById('d3-cdn-script');
|
| 141 |
+
if (!s) {
|
| 142 |
+
s = document.createElement('script');
|
| 143 |
+
s.id = 'd3-cdn-script';
|
| 144 |
+
s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
|
| 145 |
+
document.head.appendChild(s);
|
| 146 |
+
}
|
| 147 |
+
const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
|
| 148 |
+
s.addEventListener('load', onReady, { once: true });
|
| 149 |
+
if (window.d3) onReady();
|
| 150 |
+
};
|
| 151 |
+
|
| 152 |
+
const bootstrap = () => {
|
| 153 |
+
const scriptEl = document.currentScript;
|
| 154 |
+
let container = scriptEl ? scriptEl.previousElementSibling : null;
|
| 155 |
+
if (!(container && container.classList && container.classList.contains('d3-by-rule'))) {
|
| 156 |
+
const candidates = Array.from(document.querySelectorAll('.d3-by-rule'))
|
| 157 |
+
.filter((el) => !(el.dataset && el.dataset.mounted === 'true'));
|
| 158 |
+
container = candidates[candidates.length - 1] || null;
|
| 159 |
+
}
|
| 160 |
+
if (!container) return;
|
| 161 |
+
if (container.dataset) {
|
| 162 |
+
if (container.dataset.mounted === 'true') return;
|
| 163 |
+
container.dataset.mounted = 'true';
|
| 164 |
+
}
|
| 165 |
+
|
| 166 |
+
// Tooltip setup
|
| 167 |
+
container.style.position = container.style.position || 'relative';
|
| 168 |
+
const tip = document.createElement('div');
|
| 169 |
+
tip.className = 'd3-tooltip';
|
| 170 |
+
container.appendChild(tip);
|
| 171 |
+
|
| 172 |
+
// SVG setup
|
| 173 |
+
const svg = d3.select(container).append('svg');
|
| 174 |
+
const gRoot = svg.append('g');
|
| 175 |
+
|
| 176 |
+
// Chart groups
|
| 177 |
+
const gGrid = gRoot.append('g').attr('class', 'grid');
|
| 178 |
+
const gAxes = gRoot.append('g').attr('class', 'axes');
|
| 179 |
+
const gComplexity = gRoot.append('g').attr('class', 'complexity');
|
| 180 |
+
const gPoints = gRoot.append('g').attr('class', 'points');
|
| 181 |
+
const gLabels = gRoot.append('g').attr('class', 'labels');
|
| 182 |
+
const gLegend = gRoot.append('g').attr('class', 'legend');
|
| 183 |
+
|
| 184 |
+
// State
|
| 185 |
+
let data = null;
|
| 186 |
+
let modelColors = null;
|
| 187 |
+
let width = 800;
|
| 188 |
+
let height = 800;
|
| 189 |
+
const margin = { top: 20, right: 140, bottom: 50, left: 180 };
|
| 190 |
+
const complexityBarWidth = 30;
|
| 191 |
+
const complexityGap = 8;
|
| 192 |
+
|
| 193 |
+
// Active models (all visible by default)
|
| 194 |
+
let activeModels = new Set();
|
| 195 |
+
|
| 196 |
+
// Scales
|
| 197 |
+
const xScale = d3.scaleLinear();
|
| 198 |
+
const yScale = d3.scaleBand();
|
| 199 |
+
// Green to red scale: high success (1.0) = green, low success (0) = red
|
| 200 |
+
const successColorScale = d3.scaleSequential(d3.interpolateRdYlGn);
|
| 201 |
+
|
| 202 |
+
// Data loading
|
| 203 |
+
const DATA_URL = '/data/by_rule.json';
|
| 204 |
+
const COLORS_URL = '/data/overall_performance.json';
|
| 205 |
+
|
| 206 |
+
function updateSize() {
|
| 207 |
+
width = container.clientWidth || 800;
|
| 208 |
+
const numRules = data ? data.rules.length : 26;
|
| 209 |
+
const rowHeight = 24;
|
| 210 |
+
height = margin.top + margin.bottom + numRules * rowHeight;
|
| 211 |
+
svg.attr('width', width).attr('height', height).attr('viewBox', `0 0 ${width} ${height}`);
|
| 212 |
+
gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
|
| 213 |
+
return {
|
| 214 |
+
innerWidth: width - margin.left - margin.right,
|
| 215 |
+
innerHeight: height - margin.top - margin.bottom
|
| 216 |
+
};
|
| 217 |
+
}
|
| 218 |
+
|
| 219 |
+
function formatRuleName(name) {
|
| 220 |
+
return name.replace(/_/g, ' ').replace(/\b\w/g, c => c.toUpperCase());
|
| 221 |
+
}
|
| 222 |
+
|
| 223 |
+
function showRuleTooltip(event, rule) {
|
| 224 |
+
const rect = container.getBoundingClientRect();
|
| 225 |
+
const x = event.clientX - rect.left;
|
| 226 |
+
const y = event.clientY - rect.top;
|
| 227 |
+
|
| 228 |
+
tip.innerHTML = `
|
| 229 |
+
<div class="rule-name">${formatRuleName(rule.name)}</div>
|
| 230 |
+
<div class="rule-desc">${rule.description}</div>
|
| 231 |
+
<div class="metric">
|
| 232 |
+
<span class="metric-label">Success Rate:</span>
|
| 233 |
+
<span class="metric-value">${(rule.success_rate * 100).toFixed(1)}%</span>
|
| 234 |
+
</div>
|
| 235 |
+
<div class="metric">
|
| 236 |
+
<span class="metric-label">Average Score:</span>
|
| 237 |
+
<span class="metric-value">${rule.avg_score.toFixed(1)}</span>
|
| 238 |
+
</div>
|
| 239 |
+
<div class="metric">
|
| 240 |
+
<span class="metric-label">Cyclomatic Complexity:</span>
|
| 241 |
+
<span class="metric-value">${rule.cyclomatic_complexity}</span>
|
| 242 |
+
</div>
|
| 243 |
+
<div class="metric">
|
| 244 |
+
<span class="metric-label">AST Node Count:</span>
|
| 245 |
+
<span class="metric-value">${rule.node_count}</span>
|
| 246 |
+
</div>
|
| 247 |
+
<div class="metric">
|
| 248 |
+
<span class="metric-label">Aggregated Complexity:</span>
|
| 249 |
+
<span class="metric-value">${rule.aggregated_complexity.toFixed(1)}</span>
|
| 250 |
+
</div>
|
| 251 |
+
`;
|
| 252 |
+
|
| 253 |
+
const tipWidth = tip.offsetWidth || 200;
|
| 254 |
+
const tipHeight = tip.offsetHeight || 140;
|
| 255 |
+
let tipX = x + 12;
|
| 256 |
+
let tipY = y - tipHeight / 2;
|
| 257 |
+
|
| 258 |
+
if (tipX + tipWidth > width) tipX = x - tipWidth - 12;
|
| 259 |
+
if (tipY < 0) tipY = 8;
|
| 260 |
+
if (tipY + tipHeight > height) tipY = height - tipHeight - 8;
|
| 261 |
+
|
| 262 |
+
tip.style.transform = `translate(${tipX}px, ${tipY}px)`;
|
| 263 |
+
tip.style.opacity = '1';
|
| 264 |
+
}
|
| 265 |
+
|
| 266 |
+
function hideTooltip() {
|
| 267 |
+
tip.style.opacity = '0';
|
| 268 |
+
tip.style.transform = 'translate(-9999px, -9999px)';
|
| 269 |
+
}
|
| 270 |
+
|
| 271 |
+
function getContrastColor(color) {
|
| 272 |
+
// Handle both hex (#rrggbb) and rgb(r, g, b) formats
|
| 273 |
+
let r, g, b;
|
| 274 |
+
if (color.startsWith('#')) {
|
| 275 |
+
const hex = color.replace('#', '');
|
| 276 |
+
r = parseInt(hex.substr(0, 2), 16) / 255;
|
| 277 |
+
g = parseInt(hex.substr(2, 2), 16) / 255;
|
| 278 |
+
b = parseInt(hex.substr(4, 2), 16) / 255;
|
| 279 |
+
} else if (color.startsWith('rgb')) {
|
| 280 |
+
const match = color.match(/rgb\((\d+),\s*(\d+),\s*(\d+)\)/);
|
| 281 |
+
if (match) {
|
| 282 |
+
r = parseInt(match[1]) / 255;
|
| 283 |
+
g = parseInt(match[2]) / 255;
|
| 284 |
+
b = parseInt(match[3]) / 255;
|
| 285 |
+
} else {
|
| 286 |
+
return '#000000';
|
| 287 |
+
}
|
| 288 |
+
} else {
|
| 289 |
+
return '#000000';
|
| 290 |
+
}
|
| 291 |
+
const luminance = 0.299 * r + 0.587 * g + 0.114 * b;
|
| 292 |
+
return luminance > 0.5 ? '#000000' : '#ffffff';
|
| 293 |
+
}
|
| 294 |
+
|
| 295 |
+
function toggleModel(modelName) {
|
| 296 |
+
if (activeModels.has(modelName)) {
|
| 297 |
+
activeModels.delete(modelName);
|
| 298 |
+
} else {
|
| 299 |
+
activeModels.add(modelName);
|
| 300 |
+
}
|
| 301 |
+
render();
|
| 302 |
+
}
|
| 303 |
+
|
| 304 |
+
function render() {
|
| 305 |
+
if (!data || !modelColors) return;
|
| 306 |
+
|
| 307 |
+
const { innerWidth, innerHeight } = updateSize();
|
| 308 |
+
const rules = data.rules;
|
| 309 |
+
const chartWidth = innerWidth - complexityBarWidth - complexityGap;
|
| 310 |
+
|
| 311 |
+
// Update scales
|
| 312 |
+
const allScores = [];
|
| 313 |
+
rules.forEach(rule => {
|
| 314 |
+
Object.values(rule.scores_by_model).forEach(scores => {
|
| 315 |
+
allScores.push(...scores);
|
| 316 |
+
});
|
| 317 |
+
});
|
| 318 |
+
const scoreExtent = d3.extent(allScores);
|
| 319 |
+
const scorePadding = (scoreExtent[1] - scoreExtent[0]) * 0.05;
|
| 320 |
+
|
| 321 |
+
xScale
|
| 322 |
+
.domain([scoreExtent[0] - scorePadding, scoreExtent[1] + scorePadding])
|
| 323 |
+
.range([complexityBarWidth + complexityGap, innerWidth])
|
| 324 |
+
.nice();
|
| 325 |
+
|
| 326 |
+
yScale
|
| 327 |
+
.domain(rules.map(r => r.name))
|
| 328 |
+
.range([0, innerHeight])
|
| 329 |
+
.padding(0.3);
|
| 330 |
+
|
| 331 |
+
// Success rate domain: 0 to 1 (will display as 0% to 100%)
|
| 332 |
+
successColorScale.domain([0, 1]);
|
| 333 |
+
|
| 334 |
+
// Grid lines
|
| 335 |
+
const xTicks = xScale.ticks(8);
|
| 336 |
+
gGrid.selectAll('.grid-x')
|
| 337 |
+
.data(xTicks)
|
| 338 |
+
.join('line')
|
| 339 |
+
.attr('class', 'grid-x')
|
| 340 |
+
.attr('x1', d => xScale(d))
|
| 341 |
+
.attr('x2', d => xScale(d))
|
| 342 |
+
.attr('y1', 0)
|
| 343 |
+
.attr('y2', innerHeight);
|
| 344 |
+
|
| 345 |
+
// X-axis
|
| 346 |
+
gAxes.selectAll('.x-axis')
|
| 347 |
+
.data([0])
|
| 348 |
+
.join('g')
|
| 349 |
+
.attr('class', 'x-axis')
|
| 350 |
+
.attr('transform', `translate(0,${innerHeight})`)
|
| 351 |
+
.call(d3.axisBottom(xScale).ticks(8).tickSizeInner(-6).tickSizeOuter(0));
|
| 352 |
+
|
| 353 |
+
// X-axis label
|
| 354 |
+
gAxes.selectAll('.x-label')
|
| 355 |
+
.data([0])
|
| 356 |
+
.join('text')
|
| 357 |
+
.attr('class', 'x-label axis-label')
|
| 358 |
+
.attr('x', (complexityBarWidth + complexityGap + innerWidth) / 2)
|
| 359 |
+
.attr('y', innerHeight + 40)
|
| 360 |
+
.attr('text-anchor', 'middle')
|
| 361 |
+
.text('Score');
|
| 362 |
+
|
| 363 |
+
// Success rate bars
|
| 364 |
+
gComplexity.selectAll('.complexity-bar')
|
| 365 |
+
.data(rules, d => d.name)
|
| 366 |
+
.join('rect')
|
| 367 |
+
.attr('class', 'complexity-bar')
|
| 368 |
+
.attr('x', 0)
|
| 369 |
+
.attr('y', d => yScale(d.name))
|
| 370 |
+
.attr('width', complexityBarWidth)
|
| 371 |
+
.attr('height', yScale.bandwidth())
|
| 372 |
+
.attr('fill', d => successColorScale(d.success_rate))
|
| 373 |
+
.attr('rx', 2);
|
| 374 |
+
|
| 375 |
+
gComplexity.selectAll('.complexity-text')
|
| 376 |
+
.data(rules, d => d.name)
|
| 377 |
+
.join('text')
|
| 378 |
+
.attr('class', 'complexity-text')
|
| 379 |
+
.attr('x', complexityBarWidth / 2)
|
| 380 |
+
.attr('y', d => yScale(d.name) + yScale.bandwidth() / 2)
|
| 381 |
+
.attr('text-anchor', 'middle')
|
| 382 |
+
.attr('dominant-baseline', 'central')
|
| 383 |
+
.style('fill', d => getContrastColor(successColorScale(d.success_rate)))
|
| 384 |
+
.text(d => Math.round(d.success_rate * 100) + '%');
|
| 385 |
+
|
| 386 |
+
// Rule labels (Y-axis)
|
| 387 |
+
gLabels.selectAll('.rule-label')
|
| 388 |
+
.data(rules, d => d.name)
|
| 389 |
+
.join('text')
|
| 390 |
+
.attr('class', 'rule-label')
|
| 391 |
+
.attr('x', -8)
|
| 392 |
+
.attr('y', d => yScale(d.name) + yScale.bandwidth() / 2)
|
| 393 |
+
.attr('text-anchor', 'end')
|
| 394 |
+
.attr('dominant-baseline', 'central')
|
| 395 |
+
.text(d => formatRuleName(d.name))
|
| 396 |
+
.on('mouseenter', (event, d) => showRuleTooltip(event, d))
|
| 397 |
+
.on('mousemove', (event, d) => showRuleTooltip(event, d))
|
| 398 |
+
.on('mouseleave', hideTooltip);
|
| 399 |
+
|
| 400 |
+
// Data points
|
| 401 |
+
const pointData = [];
|
| 402 |
+
rules.forEach(rule => {
|
| 403 |
+
Object.entries(rule.scores_by_model).forEach(([modelName, scores]) => {
|
| 404 |
+
scores.forEach((score, seedIdx) => {
|
| 405 |
+
const color = modelColors[modelName] || '#888888';
|
| 406 |
+
pointData.push({
|
| 407 |
+
rule: rule.name,
|
| 408 |
+
model: modelName,
|
| 409 |
+
score: score,
|
| 410 |
+
seed: seedIdx,
|
| 411 |
+
color: color
|
| 412 |
+
});
|
| 413 |
+
});
|
| 414 |
+
});
|
| 415 |
+
});
|
| 416 |
+
|
| 417 |
+
const pointRadius = Math.max(3, Math.min(5, yScale.bandwidth() / 4));
|
| 418 |
+
const jitterStrength = yScale.bandwidth() * 0.3;
|
| 419 |
+
|
| 420 |
+
// Simple hash for consistent jitter
|
| 421 |
+
const hashStr = (str) => {
|
| 422 |
+
let hash = 0;
|
| 423 |
+
for (let i = 0; i < str.length; i++) {
|
| 424 |
+
hash = ((hash << 5) - hash) + str.charCodeAt(i);
|
| 425 |
+
hash |= 0;
|
| 426 |
+
}
|
| 427 |
+
return hash;
|
| 428 |
+
};
|
| 429 |
+
|
| 430 |
+
gPoints.selectAll('.point')
|
| 431 |
+
.data(pointData, d => `${d.rule}-${d.model}-${d.seed}`)
|
| 432 |
+
.join('circle')
|
| 433 |
+
.attr('class', d => `point ${activeModels.has(d.model) ? '' : 'dimmed'}`)
|
| 434 |
+
.attr('cx', d => xScale(d.score))
|
| 435 |
+
.attr('cy', d => {
|
| 436 |
+
const baseY = yScale(d.rule) + yScale.bandwidth() / 2;
|
| 437 |
+
const jitter = ((hashStr(d.model + d.seed) % 100) / 100 - 0.5) * jitterStrength;
|
| 438 |
+
return baseY + jitter;
|
| 439 |
+
})
|
| 440 |
+
.attr('r', pointRadius)
|
| 441 |
+
.attr('fill', d => d.color)
|
| 442 |
+
.attr('stroke', 'var(--surface-bg)')
|
| 443 |
+
.attr('stroke-width', 0.5);
|
| 444 |
+
|
| 445 |
+
// Legend
|
| 446 |
+
const legendX = innerWidth + 15;
|
| 447 |
+
const legendItemHeight = 16;
|
| 448 |
+
const modelNames = data.models;
|
| 449 |
+
|
| 450 |
+
const legendItems = gLegend.selectAll('.legend-item')
|
| 451 |
+
.data(modelNames)
|
| 452 |
+
.join('g')
|
| 453 |
+
.attr('class', d => `legend-item ${activeModels.has(d) ? '' : 'inactive'}`)
|
| 454 |
+
.attr('transform', (d, i) => `translate(${legendX}, ${i * legendItemHeight})`)
|
| 455 |
+
.style('cursor', 'pointer')
|
| 456 |
+
.on('click', (event, d) => toggleModel(d));
|
| 457 |
+
|
| 458 |
+
legendItems.selectAll('.legend-dot')
|
| 459 |
+
.data(d => [d])
|
| 460 |
+
.join('circle')
|
| 461 |
+
.attr('class', 'legend-dot')
|
| 462 |
+
.attr('cx', 5)
|
| 463 |
+
.attr('cy', 6)
|
| 464 |
+
.attr('r', 4)
|
| 465 |
+
.attr('fill', d => modelColors[d] || '#888888');
|
| 466 |
+
|
| 467 |
+
legendItems.selectAll('.legend-text')
|
| 468 |
+
.data(d => [d])
|
| 469 |
+
.join('text')
|
| 470 |
+
.attr('class', 'legend-text')
|
| 471 |
+
.attr('x', 14)
|
| 472 |
+
.attr('y', 9)
|
| 473 |
+
.text(d => d);
|
| 474 |
+
}
|
| 475 |
+
|
| 476 |
+
// Initialize
|
| 477 |
+
Promise.all([
|
| 478 |
+
fetch(DATA_URL, { cache: 'no-cache' }).then(r => r.json()),
|
| 479 |
+
fetch(COLORS_URL, { cache: 'no-cache' }).then(r => r.json())
|
| 480 |
+
])
|
| 481 |
+
.then(([byRuleData, perfData]) => {
|
| 482 |
+
data = byRuleData;
|
| 483 |
+
// Build color map from overall_performance.json
|
| 484 |
+
modelColors = {};
|
| 485 |
+
perfData.models.forEach(m => {
|
| 486 |
+
modelColors[m.name] = m.color;
|
| 487 |
+
});
|
| 488 |
+
// Initialize all models as active
|
| 489 |
+
activeModels = new Set(data.models);
|
| 490 |
+
render();
|
| 491 |
+
})
|
| 492 |
+
.catch(err => {
|
| 493 |
+
const pre = document.createElement('pre');
|
| 494 |
+
pre.style.color = 'red';
|
| 495 |
+
pre.style.padding = '16px';
|
| 496 |
+
pre.textContent = `Error loading data: ${err.message}`;
|
| 497 |
+
container.appendChild(pre);
|
| 498 |
+
});
|
| 499 |
+
|
| 500 |
+
// Resize handling
|
| 501 |
+
if (window.ResizeObserver) {
|
| 502 |
+
new ResizeObserver(() => render()).observe(container);
|
| 503 |
+
} else {
|
| 504 |
+
window.addEventListener('resize', render);
|
| 505 |
+
}
|
| 506 |
+
|
| 507 |
+
// Theme change handling
|
| 508 |
+
const observer = new MutationObserver(() => render());
|
| 509 |
+
observer.observe(document.documentElement, {
|
| 510 |
+
attributes: true,
|
| 511 |
+
attributeFilter: ['data-theme']
|
| 512 |
+
});
|
| 513 |
+
};
|
| 514 |
+
|
| 515 |
+
if (document.readyState === 'loading') {
|
| 516 |
+
document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
|
| 517 |
+
} else {
|
| 518 |
+
ensureD3(bootstrap);
|
| 519 |
+
}
|
| 520 |
+
})();
|
| 521 |
+
</script>
|
app/src/content/embeds/calibration-curves.html
ADDED
|
@@ -0,0 +1,537 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div class="d3-calibration-curves"></div>
|
| 2 |
+
<style>
|
| 3 |
+
.d3-calibration-curves {
|
| 4 |
+
width: 100%;
|
| 5 |
+
margin: 10px 0;
|
| 6 |
+
position: relative;
|
| 7 |
+
font-family: system-ui, -apple-system, sans-serif;
|
| 8 |
+
}
|
| 9 |
+
|
| 10 |
+
.d3-calibration-curves svg {
|
| 11 |
+
display: block;
|
| 12 |
+
width: 100%;
|
| 13 |
+
height: auto;
|
| 14 |
+
}
|
| 15 |
+
|
| 16 |
+
.d3-calibration-curves .axes path,
|
| 17 |
+
.d3-calibration-curves .axes line {
|
| 18 |
+
stroke: var(--axis-color, var(--text-color));
|
| 19 |
+
}
|
| 20 |
+
|
| 21 |
+
.d3-calibration-curves .axes text {
|
| 22 |
+
fill: var(--tick-color, var(--muted-color));
|
| 23 |
+
font-size: 11px;
|
| 24 |
+
}
|
| 25 |
+
|
| 26 |
+
.d3-calibration-curves .grid line {
|
| 27 |
+
stroke: var(--grid-color, rgba(0,0,0,.08));
|
| 28 |
+
}
|
| 29 |
+
|
| 30 |
+
.d3-calibration-curves .axes text.axis-label {
|
| 31 |
+
font-size: 14px;
|
| 32 |
+
font-weight: 500;
|
| 33 |
+
fill: var(--text-color);
|
| 34 |
+
}
|
| 35 |
+
|
| 36 |
+
.d3-calibration-curves .x-axis text {
|
| 37 |
+
transform: translateY(4px);
|
| 38 |
+
}
|
| 39 |
+
|
| 40 |
+
.d3-calibration-curves .calibration-line {
|
| 41 |
+
fill: none;
|
| 42 |
+
stroke-width: 1.5;
|
| 43 |
+
}
|
| 44 |
+
|
| 45 |
+
.d3-calibration-curves .perfect-line {
|
| 46 |
+
fill: none;
|
| 47 |
+
stroke: var(--muted-color);
|
| 48 |
+
stroke-width: 1.5;
|
| 49 |
+
stroke-dasharray: 8, 6;
|
| 50 |
+
opacity: 0.6;
|
| 51 |
+
}
|
| 52 |
+
|
| 53 |
+
.d3-calibration-curves .data-point {
|
| 54 |
+
cursor: pointer;
|
| 55 |
+
transition: transform 0.15s ease, opacity 0.15s ease;
|
| 56 |
+
}
|
| 57 |
+
|
| 58 |
+
.d3-calibration-curves .data-point:hover {
|
| 59 |
+
opacity: 0.8;
|
| 60 |
+
}
|
| 61 |
+
|
| 62 |
+
.d3-calibration-curves .legend {
|
| 63 |
+
font-size: 11px;
|
| 64 |
+
}
|
| 65 |
+
|
| 66 |
+
.d3-calibration-curves .legend-item {
|
| 67 |
+
cursor: pointer;
|
| 68 |
+
}
|
| 69 |
+
|
| 70 |
+
.d3-calibration-curves .legend-item.dimmed .legend-line,
|
| 71 |
+
.d3-calibration-curves .legend-item.dimmed .legend-marker {
|
| 72 |
+
opacity: 0.3;
|
| 73 |
+
}
|
| 74 |
+
|
| 75 |
+
.d3-calibration-curves .legend-item.dimmed text {
|
| 76 |
+
opacity: 0.4;
|
| 77 |
+
}
|
| 78 |
+
|
| 79 |
+
.d3-calibration-curves .legend-text {
|
| 80 |
+
fill: var(--text-color);
|
| 81 |
+
}
|
| 82 |
+
|
| 83 |
+
.d3-calibration-curves .d3-tooltip {
|
| 84 |
+
position: absolute;
|
| 85 |
+
top: 0;
|
| 86 |
+
left: 0;
|
| 87 |
+
transform: translate(-9999px, -9999px);
|
| 88 |
+
pointer-events: none;
|
| 89 |
+
padding: 10px 12px;
|
| 90 |
+
border-radius: 8px;
|
| 91 |
+
font-size: 12px;
|
| 92 |
+
line-height: 1.4;
|
| 93 |
+
border: 1px solid var(--border-color);
|
| 94 |
+
background: var(--surface-bg);
|
| 95 |
+
color: var(--text-color);
|
| 96 |
+
box-shadow: 0 4px 24px rgba(0,0,0,.18);
|
| 97 |
+
opacity: 0;
|
| 98 |
+
transition: opacity 0.12s ease;
|
| 99 |
+
z-index: 10;
|
| 100 |
+
}
|
| 101 |
+
|
| 102 |
+
.d3-calibration-curves .d3-tooltip .model-name {
|
| 103 |
+
font-weight: 600;
|
| 104 |
+
margin-bottom: 4px;
|
| 105 |
+
}
|
| 106 |
+
|
| 107 |
+
.d3-calibration-curves .d3-tooltip .metric {
|
| 108 |
+
display: flex;
|
| 109 |
+
justify-content: space-between;
|
| 110 |
+
gap: 16px;
|
| 111 |
+
}
|
| 112 |
+
|
| 113 |
+
.d3-calibration-curves .d3-tooltip .metric-label {
|
| 114 |
+
color: var(--muted-color);
|
| 115 |
+
}
|
| 116 |
+
|
| 117 |
+
.d3-calibration-curves .d3-tooltip .metric-value {
|
| 118 |
+
font-weight: 500;
|
| 119 |
+
}
|
| 120 |
+
</style>
|
| 121 |
+
<script>
|
| 122 |
+
(() => {
|
| 123 |
+
const ensureD3 = (cb) => {
|
| 124 |
+
if (window.d3 && typeof window.d3.select === 'function') return cb();
|
| 125 |
+
let s = document.getElementById('d3-cdn-script');
|
| 126 |
+
if (!s) {
|
| 127 |
+
s = document.createElement('script');
|
| 128 |
+
s.id = 'd3-cdn-script';
|
| 129 |
+
s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
|
| 130 |
+
document.head.appendChild(s);
|
| 131 |
+
}
|
| 132 |
+
const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
|
| 133 |
+
s.addEventListener('load', onReady, { once: true });
|
| 134 |
+
if (window.d3) onReady();
|
| 135 |
+
};
|
| 136 |
+
|
| 137 |
+
const bootstrap = () => {
|
| 138 |
+
const scriptEl = document.currentScript;
|
| 139 |
+
let container = scriptEl ? scriptEl.previousElementSibling : null;
|
| 140 |
+
if (!(container && container.classList && container.classList.contains('d3-calibration-curves'))) {
|
| 141 |
+
const candidates = Array.from(document.querySelectorAll('.d3-calibration-curves'))
|
| 142 |
+
.filter((el) => !(el.dataset && el.dataset.mounted === 'true'));
|
| 143 |
+
container = candidates[candidates.length - 1] || null;
|
| 144 |
+
}
|
| 145 |
+
if (!container) return;
|
| 146 |
+
if (container.dataset) {
|
| 147 |
+
if (container.dataset.mounted === 'true') return;
|
| 148 |
+
container.dataset.mounted = 'true';
|
| 149 |
+
}
|
| 150 |
+
|
| 151 |
+
// Tooltip setup
|
| 152 |
+
container.style.position = container.style.position || 'relative';
|
| 153 |
+
const tip = document.createElement('div');
|
| 154 |
+
tip.className = 'd3-tooltip';
|
| 155 |
+
container.appendChild(tip);
|
| 156 |
+
|
| 157 |
+
// SVG setup
|
| 158 |
+
const svg = d3.select(container).append('svg');
|
| 159 |
+
const gRoot = svg.append('g');
|
| 160 |
+
|
| 161 |
+
// Chart groups (order matters for layering)
|
| 162 |
+
const gGrid = gRoot.append('g').attr('class', 'grid');
|
| 163 |
+
const gPerfect = gRoot.append('g').attr('class', 'perfect');
|
| 164 |
+
const gLines = gRoot.append('g').attr('class', 'lines');
|
| 165 |
+
const gPoints = gRoot.append('g').attr('class', 'points');
|
| 166 |
+
const gAxes = gRoot.append('g').attr('class', 'axes');
|
| 167 |
+
const gLegend = gRoot.append('g').attr('class', 'legend');
|
| 168 |
+
|
| 169 |
+
// State
|
| 170 |
+
let data = null;
|
| 171 |
+
let width = 800;
|
| 172 |
+
let height = 500;
|
| 173 |
+
const margin = { top: 20, right: 180, bottom: 56, left: 72 };
|
| 174 |
+
let hiddenModels = new Set();
|
| 175 |
+
|
| 176 |
+
// Scales
|
| 177 |
+
const xScale = d3.scaleLinear();
|
| 178 |
+
const yScale = d3.scaleLinear();
|
| 179 |
+
|
| 180 |
+
// Line generator - convert confidence level to probability (divide by 10)
|
| 181 |
+
const line = d3.line()
|
| 182 |
+
.x(d => xScale(d.confidence_level / 10))
|
| 183 |
+
.y(d => yScale(d.actual_success_rate));
|
| 184 |
+
|
| 185 |
+
// Data loading
|
| 186 |
+
const DATA_URL = '/data/calibration_curves.json';
|
| 187 |
+
|
| 188 |
+
function updateSize() {
|
| 189 |
+
width = container.clientWidth || 800;
|
| 190 |
+
// Calculate inner dimensions, ensuring square plot area
|
| 191 |
+
const availableWidth = width - margin.left - margin.right;
|
| 192 |
+
const maxHeight = Math.round(width * 0.8); // Limit max height
|
| 193 |
+
const innerSize = Math.min(availableWidth, maxHeight - margin.top - margin.bottom);
|
| 194 |
+
height = innerSize + margin.top + margin.bottom;
|
| 195 |
+
svg.attr('width', width).attr('height', height).attr('viewBox', `0 0 ${width} ${height}`);
|
| 196 |
+
gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
|
| 197 |
+
return {
|
| 198 |
+
innerWidth: innerSize,
|
| 199 |
+
innerHeight: innerSize
|
| 200 |
+
};
|
| 201 |
+
}
|
| 202 |
+
|
| 203 |
+
function showTooltip(event, d, model) {
|
| 204 |
+
const rect = container.getBoundingClientRect();
|
| 205 |
+
const x = event.clientX - rect.left;
|
| 206 |
+
const y = event.clientY - rect.top;
|
| 207 |
+
|
| 208 |
+
const reportedConfidence = d.confidence_level / 10;
|
| 209 |
+
|
| 210 |
+
tip.innerHTML = `
|
| 211 |
+
<div class="model-name" style="color: ${model.color}">${model.name}</div>
|
| 212 |
+
<div class="metric">
|
| 213 |
+
<span class="metric-label">Reported confidence:</span>
|
| 214 |
+
<span class="metric-value">${Math.round(reportedConfidence * 100)}%</span>
|
| 215 |
+
</div>
|
| 216 |
+
<div class="metric">
|
| 217 |
+
<span class="metric-label">Actual success:</span>
|
| 218 |
+
<span class="metric-value">${(d.actual_success_rate * 100).toFixed(1)}%</span>
|
| 219 |
+
</div>
|
| 220 |
+
<div class="metric">
|
| 221 |
+
<span class="metric-label">Sample size:</span>
|
| 222 |
+
<span class="metric-value">${d.sample_count}</span>
|
| 223 |
+
</div>
|
| 224 |
+
`;
|
| 225 |
+
|
| 226 |
+
const tipWidth = tip.offsetWidth || 150;
|
| 227 |
+
const tipHeight = tip.offsetHeight || 100;
|
| 228 |
+
let tipX = x + 12;
|
| 229 |
+
let tipY = y - tipHeight / 2;
|
| 230 |
+
|
| 231 |
+
if (tipX + tipWidth > width) tipX = x - tipWidth - 12;
|
| 232 |
+
if (tipY < 0) tipY = 8;
|
| 233 |
+
if (tipY + tipHeight > height) tipY = height - tipHeight - 8;
|
| 234 |
+
|
| 235 |
+
tip.style.transform = `translate(${tipX}px, ${tipY}px)`;
|
| 236 |
+
tip.style.opacity = '1';
|
| 237 |
+
}
|
| 238 |
+
|
| 239 |
+
function hideTooltip() {
|
| 240 |
+
tip.style.opacity = '0';
|
| 241 |
+
tip.style.transform = 'translate(-9999px, -9999px)';
|
| 242 |
+
}
|
| 243 |
+
|
| 244 |
+
function toggleModel(modelName) {
|
| 245 |
+
if (hiddenModels.has(modelName)) {
|
| 246 |
+
hiddenModels.delete(modelName);
|
| 247 |
+
} else {
|
| 248 |
+
hiddenModels.add(modelName);
|
| 249 |
+
}
|
| 250 |
+
render();
|
| 251 |
+
}
|
| 252 |
+
|
| 253 |
+
function render() {
|
| 254 |
+
if (!data) return;
|
| 255 |
+
|
| 256 |
+
const { innerWidth, innerHeight } = updateSize();
|
| 257 |
+
const models = data.models;
|
| 258 |
+
|
| 259 |
+
// Equal scales for both axes (0-1 probability) to ensure 45° diagonal
|
| 260 |
+
xScale
|
| 261 |
+
.domain([0, 1])
|
| 262 |
+
.range([0, innerWidth]);
|
| 263 |
+
|
| 264 |
+
yScale
|
| 265 |
+
.domain([0, 1])
|
| 266 |
+
.range([innerHeight, 0]);
|
| 267 |
+
|
| 268 |
+
// Grid lines - same ticks for both axes
|
| 269 |
+
const ticks = [0, 0.2, 0.4, 0.6, 0.8, 1.0];
|
| 270 |
+
const xTicks = ticks;
|
| 271 |
+
const yTicks = ticks;
|
| 272 |
+
|
| 273 |
+
gGrid.selectAll('.grid-x')
|
| 274 |
+
.data(xTicks)
|
| 275 |
+
.join('line')
|
| 276 |
+
.attr('class', 'grid-x')
|
| 277 |
+
.attr('x1', d => xScale(d))
|
| 278 |
+
.attr('x2', d => xScale(d))
|
| 279 |
+
.attr('y1', 0)
|
| 280 |
+
.attr('y2', innerHeight);
|
| 281 |
+
|
| 282 |
+
gGrid.selectAll('.grid-y')
|
| 283 |
+
.data(yTicks)
|
| 284 |
+
.join('line')
|
| 285 |
+
.attr('class', 'grid-y')
|
| 286 |
+
.attr('x1', 0)
|
| 287 |
+
.attr('x2', innerWidth)
|
| 288 |
+
.attr('y1', d => yScale(d))
|
| 289 |
+
.attr('y2', d => yScale(d));
|
| 290 |
+
|
| 291 |
+
// Perfect calibration line (diagonal from 0,0 to 1,1)
|
| 292 |
+
gPerfect.selectAll('.perfect-line')
|
| 293 |
+
.data([0])
|
| 294 |
+
.join('line')
|
| 295 |
+
.attr('class', 'perfect-line')
|
| 296 |
+
.attr('x1', xScale(0))
|
| 297 |
+
.attr('y1', yScale(0))
|
| 298 |
+
.attr('x2', xScale(1))
|
| 299 |
+
.attr('y2', yScale(1));
|
| 300 |
+
|
| 301 |
+
// Axes - format as percentages
|
| 302 |
+
const tickSize = 6;
|
| 303 |
+
const percentFormat = d => `${Math.round(d * 100)}%`;
|
| 304 |
+
|
| 305 |
+
gAxes.selectAll('.x-axis')
|
| 306 |
+
.data([0])
|
| 307 |
+
.join('g')
|
| 308 |
+
.attr('class', 'x-axis')
|
| 309 |
+
.attr('transform', `translate(0,${innerHeight})`)
|
| 310 |
+
.call(d3.axisBottom(xScale)
|
| 311 |
+
.tickValues(xTicks)
|
| 312 |
+
.tickFormat(percentFormat)
|
| 313 |
+
.tickSizeInner(-tickSize)
|
| 314 |
+
.tickSizeOuter(0));
|
| 315 |
+
|
| 316 |
+
gAxes.selectAll('.y-axis')
|
| 317 |
+
.data([0])
|
| 318 |
+
.join('g')
|
| 319 |
+
.attr('class', 'y-axis')
|
| 320 |
+
.call(d3.axisLeft(yScale)
|
| 321 |
+
.tickValues(yTicks)
|
| 322 |
+
.tickFormat(percentFormat)
|
| 323 |
+
.tickSizeInner(-tickSize)
|
| 324 |
+
.tickSizeOuter(0));
|
| 325 |
+
|
| 326 |
+
// Axis labels
|
| 327 |
+
gAxes.selectAll('.x-label')
|
| 328 |
+
.data([0])
|
| 329 |
+
.join('text')
|
| 330 |
+
.attr('class', 'x-label axis-label')
|
| 331 |
+
.attr('x', innerWidth / 2)
|
| 332 |
+
.attr('y', innerHeight + 44)
|
| 333 |
+
.attr('text-anchor', 'middle')
|
| 334 |
+
.text('Reported Confidence');
|
| 335 |
+
|
| 336 |
+
gAxes.selectAll('.y-label')
|
| 337 |
+
.data([0])
|
| 338 |
+
.join('text')
|
| 339 |
+
.attr('class', 'y-label axis-label')
|
| 340 |
+
.attr('x', -innerHeight / 2)
|
| 341 |
+
.attr('y', -52)
|
| 342 |
+
.attr('text-anchor', 'middle')
|
| 343 |
+
.attr('transform', 'rotate(-90)')
|
| 344 |
+
.text('Actual Success Rate');
|
| 345 |
+
|
| 346 |
+
// Lines for each model
|
| 347 |
+
const visibleModels = models.filter(m => !hiddenModels.has(m.name));
|
| 348 |
+
|
| 349 |
+
gLines.selectAll('.calibration-line')
|
| 350 |
+
.data(visibleModels, d => d.name)
|
| 351 |
+
.join('path')
|
| 352 |
+
.attr('class', 'calibration-line')
|
| 353 |
+
.attr('d', d => line(d.calibration_points))
|
| 354 |
+
.attr('stroke', d => d.color);
|
| 355 |
+
|
| 356 |
+
// Data points - circles for closed models, stars for open models
|
| 357 |
+
const allPoints = visibleModels.flatMap(model =>
|
| 358 |
+
model.calibration_points.map(p => ({ ...p, model }))
|
| 359 |
+
);
|
| 360 |
+
const closedPoints = allPoints.filter(d => !d.model.is_open);
|
| 361 |
+
const openPoints = allPoints.filter(d => d.model.is_open);
|
| 362 |
+
|
| 363 |
+
// Helper function to create a 5-point star path
|
| 364 |
+
const starPath = (cx, cy, outerR, innerR) => {
|
| 365 |
+
const points = [];
|
| 366 |
+
for (let i = 0; i < 10; i++) {
|
| 367 |
+
const r = i % 2 === 0 ? outerR : innerR;
|
| 368 |
+
const angle = (Math.PI / 2) + (i * Math.PI / 5);
|
| 369 |
+
points.push([cx + r * Math.cos(angle), cy - r * Math.sin(angle)]);
|
| 370 |
+
}
|
| 371 |
+
return 'M' + points.map(p => p.join(',')).join('L') + 'Z';
|
| 372 |
+
};
|
| 373 |
+
|
| 374 |
+
// Circles for closed models
|
| 375 |
+
gPoints.selectAll('.data-point-circle')
|
| 376 |
+
.data(closedPoints, d => `${d.model.name}-${d.confidence_level}`)
|
| 377 |
+
.join('circle')
|
| 378 |
+
.attr('class', 'data-point data-point-circle')
|
| 379 |
+
.attr('cx', d => xScale(d.confidence_level / 10))
|
| 380 |
+
.attr('cy', d => yScale(d.actual_success_rate))
|
| 381 |
+
.attr('r', 4)
|
| 382 |
+
.attr('fill', d => d.model.color)
|
| 383 |
+
.attr('stroke', 'var(--surface-bg, white)')
|
| 384 |
+
.attr('stroke-width', 1)
|
| 385 |
+
.on('mouseenter', (event, d) => showTooltip(event, d, d.model))
|
| 386 |
+
.on('mousemove', (event, d) => showTooltip(event, d, d.model))
|
| 387 |
+
.on('mouseleave', hideTooltip);
|
| 388 |
+
|
| 389 |
+
// Stars for open models
|
| 390 |
+
gPoints.selectAll('.data-point-star')
|
| 391 |
+
.data(openPoints, d => `${d.model.name}-${d.confidence_level}`)
|
| 392 |
+
.join('path')
|
| 393 |
+
.attr('class', 'data-point data-point-star')
|
| 394 |
+
.attr('d', d => starPath(
|
| 395 |
+
xScale(d.confidence_level / 10),
|
| 396 |
+
yScale(d.actual_success_rate),
|
| 397 |
+
6, 2.6
|
| 398 |
+
))
|
| 399 |
+
.attr('fill', d => d.model.color)
|
| 400 |
+
.attr('stroke', 'var(--surface-bg, white)')
|
| 401 |
+
.attr('stroke-width', 0.8)
|
| 402 |
+
.on('mouseenter', (event, d) => showTooltip(event, d, d.model))
|
| 403 |
+
.on('mousemove', (event, d) => showTooltip(event, d, d.model))
|
| 404 |
+
.on('mouseleave', hideTooltip);
|
| 405 |
+
|
| 406 |
+
// Legend
|
| 407 |
+
const legendX = innerWidth + 16;
|
| 408 |
+
const legendItemHeight = 20;
|
| 409 |
+
|
| 410 |
+
// Perfect calibration in legend
|
| 411 |
+
const legendItems = [
|
| 412 |
+
{ name: 'Perfect calibration', color: 'var(--muted-color)', isPerfect: true }
|
| 413 |
+
].concat(models);
|
| 414 |
+
|
| 415 |
+
gLegend.selectAll('.legend-item')
|
| 416 |
+
.data(legendItems, d => d.name)
|
| 417 |
+
.join('g')
|
| 418 |
+
.attr('class', d => {
|
| 419 |
+
if (d.isPerfect) return 'legend-item';
|
| 420 |
+
return `legend-item ${hiddenModels.has(d.name) ? 'dimmed' : ''}`;
|
| 421 |
+
})
|
| 422 |
+
.attr('transform', (d, i) => `translate(${legendX}, ${i * legendItemHeight})`)
|
| 423 |
+
.each(function(d) {
|
| 424 |
+
const g = d3.select(this);
|
| 425 |
+
g.selectAll('*').remove();
|
| 426 |
+
|
| 427 |
+
if (d.isPerfect) {
|
| 428 |
+
// Dashed line for perfect calibration
|
| 429 |
+
g.append('line')
|
| 430 |
+
.attr('class', 'legend-line')
|
| 431 |
+
.attr('x1', 0)
|
| 432 |
+
.attr('x2', 20)
|
| 433 |
+
.attr('y1', 0)
|
| 434 |
+
.attr('y2', 0)
|
| 435 |
+
.attr('stroke', d.color)
|
| 436 |
+
.attr('stroke-width', 1.5)
|
| 437 |
+
.attr('stroke-dasharray', '6, 4')
|
| 438 |
+
.attr('opacity', 0.6);
|
| 439 |
+
} else {
|
| 440 |
+
// Line segment (solid for all models)
|
| 441 |
+
g.append('line')
|
| 442 |
+
.attr('class', 'legend-line')
|
| 443 |
+
.attr('x1', 0)
|
| 444 |
+
.attr('x2', 20)
|
| 445 |
+
.attr('y1', 0)
|
| 446 |
+
.attr('y2', 0)
|
| 447 |
+
.attr('stroke', d.color)
|
| 448 |
+
.attr('stroke-width', 1.5);
|
| 449 |
+
|
| 450 |
+
// Marker - circle for closed, star for open
|
| 451 |
+
if (d.is_open) {
|
| 452 |
+
// Small star for open models
|
| 453 |
+
const starPath = (cx, cy, outerR, innerR) => {
|
| 454 |
+
const points = [];
|
| 455 |
+
for (let i = 0; i < 10; i++) {
|
| 456 |
+
const r = i % 2 === 0 ? outerR : innerR;
|
| 457 |
+
const angle = (Math.PI / 2) + (i * Math.PI / 5);
|
| 458 |
+
points.push([cx + r * Math.cos(angle), cy - r * Math.sin(angle)]);
|
| 459 |
+
}
|
| 460 |
+
return 'M' + points.map(p => p.join(',')).join('L') + 'Z';
|
| 461 |
+
};
|
| 462 |
+
g.append('path')
|
| 463 |
+
.attr('class', 'legend-marker')
|
| 464 |
+
.attr('d', starPath(10, 0, 6, 2.6))
|
| 465 |
+
.attr('fill', d.color);
|
| 466 |
+
} else {
|
| 467 |
+
g.append('circle')
|
| 468 |
+
.attr('class', 'legend-marker')
|
| 469 |
+
.attr('cx', 10)
|
| 470 |
+
.attr('cy', 0)
|
| 471 |
+
.attr('r', 3.5)
|
| 472 |
+
.attr('fill', d.color);
|
| 473 |
+
}
|
| 474 |
+
}
|
| 475 |
+
|
| 476 |
+
g.append('text')
|
| 477 |
+
.attr('class', 'legend-text')
|
| 478 |
+
.attr('x', 26)
|
| 479 |
+
.attr('y', 4)
|
| 480 |
+
.text(d.name);
|
| 481 |
+
|
| 482 |
+
if (!d.isPerfect) {
|
| 483 |
+
g.style('cursor', 'pointer')
|
| 484 |
+
.on('click', () => toggleModel(d.name));
|
| 485 |
+
}
|
| 486 |
+
});
|
| 487 |
+
|
| 488 |
+
// Legend note about line styles
|
| 489 |
+
const noteY = legendItems.length * legendItemHeight + 12;
|
| 490 |
+
gLegend.selectAll('.legend-note')
|
| 491 |
+
.data([0])
|
| 492 |
+
.join('text')
|
| 493 |
+
.attr('class', 'legend-note')
|
| 494 |
+
.attr('x', legendX)
|
| 495 |
+
.attr('y', noteY)
|
| 496 |
+
.attr('font-size', '10px')
|
| 497 |
+
.attr('fill', 'var(--muted-color)')
|
| 498 |
+
.text('● = Closed, ★ = Open');
|
| 499 |
+
}
|
| 500 |
+
|
| 501 |
+
// Initialize
|
| 502 |
+
fetch(DATA_URL, { cache: 'no-cache' })
|
| 503 |
+
.then(r => r.json())
|
| 504 |
+
.then(json => {
|
| 505 |
+
data = json;
|
| 506 |
+
render();
|
| 507 |
+
})
|
| 508 |
+
.catch(err => {
|
| 509 |
+
const pre = document.createElement('pre');
|
| 510 |
+
pre.style.color = 'red';
|
| 511 |
+
pre.style.padding = '16px';
|
| 512 |
+
pre.textContent = `Error loading data: ${err.message}`;
|
| 513 |
+
container.appendChild(pre);
|
| 514 |
+
});
|
| 515 |
+
|
| 516 |
+
// Resize handling
|
| 517 |
+
if (window.ResizeObserver) {
|
| 518 |
+
new ResizeObserver(() => render()).observe(container);
|
| 519 |
+
} else {
|
| 520 |
+
window.addEventListener('resize', render);
|
| 521 |
+
}
|
| 522 |
+
|
| 523 |
+
// Theme change handling
|
| 524 |
+
const observer = new MutationObserver(() => render());
|
| 525 |
+
observer.observe(document.documentElement, {
|
| 526 |
+
attributes: true,
|
| 527 |
+
attributeFilter: ['data-theme']
|
| 528 |
+
});
|
| 529 |
+
};
|
| 530 |
+
|
| 531 |
+
if (document.readyState === 'loading') {
|
| 532 |
+
document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
|
| 533 |
+
} else {
|
| 534 |
+
ensureD3(bootstrap);
|
| 535 |
+
}
|
| 536 |
+
})();
|
| 537 |
+
</script>
|
app/src/content/embeds/caution-vs-failed-guesses.html
ADDED
|
@@ -0,0 +1,369 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div class="d3-caution-vs-failed-guesses"></div>
|
| 2 |
+
<style>
|
| 3 |
+
.d3-caution-vs-failed-guesses {
|
| 4 |
+
width: 100%;
|
| 5 |
+
margin: 10px 0;
|
| 6 |
+
position: relative;
|
| 7 |
+
font-family: system-ui, -apple-system, sans-serif;
|
| 8 |
+
}
|
| 9 |
+
|
| 10 |
+
.d3-caution-vs-failed-guesses svg {
|
| 11 |
+
display: block;
|
| 12 |
+
width: 100%;
|
| 13 |
+
height: auto;
|
| 14 |
+
}
|
| 15 |
+
|
| 16 |
+
.d3-caution-vs-failed-guesses .axes path,
|
| 17 |
+
.d3-caution-vs-failed-guesses .axes line {
|
| 18 |
+
stroke: var(--axis-color, var(--text-color));
|
| 19 |
+
}
|
| 20 |
+
|
| 21 |
+
.d3-caution-vs-failed-guesses .axes text {
|
| 22 |
+
fill: var(--tick-color, var(--muted-color));
|
| 23 |
+
font-size: 11px;
|
| 24 |
+
}
|
| 25 |
+
|
| 26 |
+
.d3-caution-vs-failed-guesses .grid line {
|
| 27 |
+
stroke: var(--grid-color, rgba(0,0,0,.08));
|
| 28 |
+
}
|
| 29 |
+
|
| 30 |
+
.d3-caution-vs-failed-guesses .axes text.axis-label {
|
| 31 |
+
font-size: 15px;
|
| 32 |
+
font-weight: 500;
|
| 33 |
+
fill: var(--text-color);
|
| 34 |
+
}
|
| 35 |
+
|
| 36 |
+
.d3-caution-vs-failed-guesses .x-axis text {
|
| 37 |
+
transform: translateY(4px);
|
| 38 |
+
}
|
| 39 |
+
|
| 40 |
+
.d3-caution-vs-failed-guesses .point {
|
| 41 |
+
cursor: pointer;
|
| 42 |
+
transition: opacity 0.15s ease;
|
| 43 |
+
}
|
| 44 |
+
|
| 45 |
+
.d3-caution-vs-failed-guesses .point:hover {
|
| 46 |
+
opacity: 0.8;
|
| 47 |
+
}
|
| 48 |
+
|
| 49 |
+
.d3-caution-vs-failed-guesses .point-label {
|
| 50 |
+
font-size: 11px;
|
| 51 |
+
fill: var(--text-color);
|
| 52 |
+
pointer-events: none;
|
| 53 |
+
}
|
| 54 |
+
|
| 55 |
+
.d3-caution-vs-failed-guesses .d3-tooltip {
|
| 56 |
+
position: absolute;
|
| 57 |
+
top: 0;
|
| 58 |
+
left: 0;
|
| 59 |
+
transform: translate(-9999px, -9999px);
|
| 60 |
+
pointer-events: none;
|
| 61 |
+
padding: 10px 12px;
|
| 62 |
+
border-radius: 8px;
|
| 63 |
+
font-size: 12px;
|
| 64 |
+
line-height: 1.4;
|
| 65 |
+
border: 1px solid var(--border-color);
|
| 66 |
+
background: var(--surface-bg);
|
| 67 |
+
color: var(--text-color);
|
| 68 |
+
box-shadow: 0 4px 24px rgba(0,0,0,.18);
|
| 69 |
+
opacity: 0;
|
| 70 |
+
transition: opacity 0.12s ease;
|
| 71 |
+
z-index: 10;
|
| 72 |
+
}
|
| 73 |
+
|
| 74 |
+
.d3-caution-vs-failed-guesses .d3-tooltip .model-name {
|
| 75 |
+
font-weight: 600;
|
| 76 |
+
margin-bottom: 4px;
|
| 77 |
+
}
|
| 78 |
+
|
| 79 |
+
.d3-caution-vs-failed-guesses .d3-tooltip .metric {
|
| 80 |
+
display: flex;
|
| 81 |
+
justify-content: space-between;
|
| 82 |
+
gap: 16px;
|
| 83 |
+
}
|
| 84 |
+
|
| 85 |
+
.d3-caution-vs-failed-guesses .d3-tooltip .metric-label {
|
| 86 |
+
color: var(--muted-color);
|
| 87 |
+
}
|
| 88 |
+
|
| 89 |
+
.d3-caution-vs-failed-guesses .d3-tooltip .metric-value {
|
| 90 |
+
font-weight: 500;
|
| 91 |
+
}
|
| 92 |
+
</style>
|
| 93 |
+
<script>
|
| 94 |
+
(() => {
|
| 95 |
+
const ensureD3 = (cb) => {
|
| 96 |
+
if (window.d3 && typeof window.d3.select === 'function') return cb();
|
| 97 |
+
let s = document.getElementById('d3-cdn-script');
|
| 98 |
+
if (!s) {
|
| 99 |
+
s = document.createElement('script');
|
| 100 |
+
s.id = 'd3-cdn-script';
|
| 101 |
+
s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
|
| 102 |
+
document.head.appendChild(s);
|
| 103 |
+
}
|
| 104 |
+
const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
|
| 105 |
+
s.addEventListener('load', onReady, { once: true });
|
| 106 |
+
if (window.d3) onReady();
|
| 107 |
+
};
|
| 108 |
+
|
| 109 |
+
const bootstrap = () => {
|
| 110 |
+
const scriptEl = document.currentScript;
|
| 111 |
+
let container = scriptEl ? scriptEl.previousElementSibling : null;
|
| 112 |
+
if (!(container && container.classList && container.classList.contains('d3-caution-vs-failed-guesses'))) {
|
| 113 |
+
const candidates = Array.from(document.querySelectorAll('.d3-caution-vs-failed-guesses'))
|
| 114 |
+
.filter((el) => !(el.dataset && el.dataset.mounted === 'true'));
|
| 115 |
+
container = candidates[candidates.length - 1] || null;
|
| 116 |
+
}
|
| 117 |
+
if (!container) return;
|
| 118 |
+
if (container.dataset) {
|
| 119 |
+
if (container.dataset.mounted === 'true') return;
|
| 120 |
+
container.dataset.mounted = 'true';
|
| 121 |
+
}
|
| 122 |
+
|
| 123 |
+
// Tooltip setup
|
| 124 |
+
container.style.position = container.style.position || 'relative';
|
| 125 |
+
const tip = document.createElement('div');
|
| 126 |
+
tip.className = 'd3-tooltip';
|
| 127 |
+
container.appendChild(tip);
|
| 128 |
+
|
| 129 |
+
// SVG setup
|
| 130 |
+
const svg = d3.select(container).append('svg');
|
| 131 |
+
const gRoot = svg.append('g');
|
| 132 |
+
|
| 133 |
+
// Chart groups
|
| 134 |
+
const gGrid = gRoot.append('g').attr('class', 'grid');
|
| 135 |
+
const gAxes = gRoot.append('g').attr('class', 'axes');
|
| 136 |
+
const gPoints = gRoot.append('g').attr('class', 'points');
|
| 137 |
+
const gLabels = gRoot.append('g').attr('class', 'labels');
|
| 138 |
+
|
| 139 |
+
// State
|
| 140 |
+
let data = null;
|
| 141 |
+
let width = 800;
|
| 142 |
+
let height = 450;
|
| 143 |
+
const margin = { top: 20, right: 120, bottom: 56, left: 72 };
|
| 144 |
+
|
| 145 |
+
// Scales
|
| 146 |
+
const xScale = d3.scaleLinear();
|
| 147 |
+
const yScale = d3.scaleLinear();
|
| 148 |
+
|
| 149 |
+
// Data loading
|
| 150 |
+
const DATA_URL = '/data/caution_vs_failed_guesses.json';
|
| 151 |
+
|
| 152 |
+
function updateSize() {
|
| 153 |
+
width = container.clientWidth || 800;
|
| 154 |
+
height = Math.max(300, Math.round(width / 1.5));
|
| 155 |
+
svg.attr('width', width).attr('height', height).attr('viewBox', `0 0 ${width} ${height}`);
|
| 156 |
+
gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
|
| 157 |
+
return {
|
| 158 |
+
innerWidth: width - margin.left - margin.right,
|
| 159 |
+
innerHeight: height - margin.top - margin.bottom
|
| 160 |
+
};
|
| 161 |
+
}
|
| 162 |
+
|
| 163 |
+
function showTooltip(event, d) {
|
| 164 |
+
const rect = container.getBoundingClientRect();
|
| 165 |
+
const x = event.clientX - rect.left;
|
| 166 |
+
const y = event.clientY - rect.top;
|
| 167 |
+
|
| 168 |
+
tip.innerHTML = `
|
| 169 |
+
<div class="model-name" style="color: ${d.color}">${d.name}</div>
|
| 170 |
+
<div class="metric">
|
| 171 |
+
<span class="metric-label">Early Correct Turns:</span>
|
| 172 |
+
<span class="metric-value">${d.avg_early_correct_turns.toFixed(2)}</span>
|
| 173 |
+
</div>
|
| 174 |
+
<div class="metric">
|
| 175 |
+
<span class="metric-label">Failed Guesses:</span>
|
| 176 |
+
<span class="metric-value">${d.avg_failed_guesses.toFixed(2)}</span>
|
| 177 |
+
</div>
|
| 178 |
+
<div class="metric">
|
| 179 |
+
<span class="metric-label">Type:</span>
|
| 180 |
+
<span class="metric-value">${d.is_open ? 'Open' : 'Closed'}</span>
|
| 181 |
+
</div>
|
| 182 |
+
`;
|
| 183 |
+
|
| 184 |
+
const tipWidth = tip.offsetWidth || 150;
|
| 185 |
+
const tipHeight = tip.offsetHeight || 80;
|
| 186 |
+
let tipX = x + 12;
|
| 187 |
+
let tipY = y - tipHeight / 2;
|
| 188 |
+
|
| 189 |
+
if (tipX + tipWidth > width) tipX = x - tipWidth - 12;
|
| 190 |
+
if (tipY < 0) tipY = 8;
|
| 191 |
+
if (tipY + tipHeight > height) tipY = height - tipHeight - 8;
|
| 192 |
+
|
| 193 |
+
tip.style.transform = `translate(${tipX}px, ${tipY}px)`;
|
| 194 |
+
tip.style.opacity = '1';
|
| 195 |
+
}
|
| 196 |
+
|
| 197 |
+
function hideTooltip() {
|
| 198 |
+
tip.style.opacity = '0';
|
| 199 |
+
tip.style.transform = 'translate(-9999px, -9999px)';
|
| 200 |
+
}
|
| 201 |
+
|
| 202 |
+
function render() {
|
| 203 |
+
if (!data) return;
|
| 204 |
+
|
| 205 |
+
const { innerWidth, innerHeight } = updateSize();
|
| 206 |
+
const models = data.models;
|
| 207 |
+
|
| 208 |
+
// Update scales - X starts at 0
|
| 209 |
+
const xExtent = d3.extent(models, d => d.avg_failed_guesses);
|
| 210 |
+
const yExtent = d3.extent(models, d => d.avg_early_correct_turns);
|
| 211 |
+
const xPadding = (xExtent[1] - xExtent[0]) * 0.1;
|
| 212 |
+
const yPadding = (yExtent[1] - yExtent[0]) * 0.1;
|
| 213 |
+
|
| 214 |
+
xScale
|
| 215 |
+
.domain([0, xExtent[1] + xPadding])
|
| 216 |
+
.range([0, innerWidth])
|
| 217 |
+
.nice();
|
| 218 |
+
|
| 219 |
+
yScale
|
| 220 |
+
.domain([0, yExtent[1] + yPadding])
|
| 221 |
+
.range([innerHeight, 0])
|
| 222 |
+
.nice();
|
| 223 |
+
|
| 224 |
+
// Grid lines
|
| 225 |
+
const xTicks = xScale.ticks(6);
|
| 226 |
+
const yTicks = yScale.ticks(6);
|
| 227 |
+
|
| 228 |
+
gGrid.selectAll('.grid-x')
|
| 229 |
+
.data(xTicks)
|
| 230 |
+
.join('line')
|
| 231 |
+
.attr('class', 'grid-x')
|
| 232 |
+
.attr('x1', d => xScale(d))
|
| 233 |
+
.attr('x2', d => xScale(d))
|
| 234 |
+
.attr('y1', 0)
|
| 235 |
+
.attr('y2', innerHeight);
|
| 236 |
+
|
| 237 |
+
gGrid.selectAll('.grid-y')
|
| 238 |
+
.data(yTicks)
|
| 239 |
+
.join('line')
|
| 240 |
+
.attr('class', 'grid-y')
|
| 241 |
+
.attr('x1', 0)
|
| 242 |
+
.attr('x2', innerWidth)
|
| 243 |
+
.attr('y1', d => yScale(d))
|
| 244 |
+
.attr('y2', d => yScale(d));
|
| 245 |
+
|
| 246 |
+
// Axes with inner ticks
|
| 247 |
+
const tickSize = 6;
|
| 248 |
+
gAxes.selectAll('.x-axis')
|
| 249 |
+
.data([0])
|
| 250 |
+
.join('g')
|
| 251 |
+
.attr('class', 'x-axis')
|
| 252 |
+
.attr('transform', `translate(0,${innerHeight})`)
|
| 253 |
+
.call(d3.axisBottom(xScale).ticks(6).tickSizeInner(-tickSize).tickSizeOuter(0));
|
| 254 |
+
|
| 255 |
+
gAxes.selectAll('.y-axis')
|
| 256 |
+
.data([0])
|
| 257 |
+
.join('g')
|
| 258 |
+
.attr('class', 'y-axis')
|
| 259 |
+
.call(d3.axisLeft(yScale).ticks(6).tickSizeInner(-tickSize).tickSizeOuter(0));
|
| 260 |
+
|
| 261 |
+
// Axis labels
|
| 262 |
+
gAxes.selectAll('.x-label')
|
| 263 |
+
.data([0])
|
| 264 |
+
.join('text')
|
| 265 |
+
.attr('class', 'x-label axis-label')
|
| 266 |
+
.attr('x', innerWidth / 2)
|
| 267 |
+
.attr('y', innerHeight + 44)
|
| 268 |
+
.attr('text-anchor', 'middle')
|
| 269 |
+
.text('Average Failed Guesses per Round');
|
| 270 |
+
|
| 271 |
+
gAxes.selectAll('.y-label')
|
| 272 |
+
.data([0])
|
| 273 |
+
.join('text')
|
| 274 |
+
.attr('class', 'y-label axis-label')
|
| 275 |
+
.attr('x', -innerHeight / 2)
|
| 276 |
+
.attr('y', -52)
|
| 277 |
+
.attr('text-anchor', 'middle')
|
| 278 |
+
.attr('transform', 'rotate(-90)')
|
| 279 |
+
.text('Average Early Correct Turns');
|
| 280 |
+
|
| 281 |
+
// Points - circles for closed models, stars for open models
|
| 282 |
+
const pointRadius = Math.max(8, Math.min(16, innerWidth / 60));
|
| 283 |
+
|
| 284 |
+
// Helper function to create a 5-point star path
|
| 285 |
+
const starPath = (cx, cy, outerR, innerR) => {
|
| 286 |
+
const points = [];
|
| 287 |
+
for (let i = 0; i < 10; i++) {
|
| 288 |
+
const r = i % 2 === 0 ? outerR : innerR;
|
| 289 |
+
const angle = (Math.PI / 2) + (i * Math.PI / 5);
|
| 290 |
+
points.push([cx + r * Math.cos(angle), cy - r * Math.sin(angle)]);
|
| 291 |
+
}
|
| 292 |
+
return 'M' + points.map(p => p.join(',')).join('L') + 'Z';
|
| 293 |
+
};
|
| 294 |
+
|
| 295 |
+
// Closed models as circles
|
| 296 |
+
const closedModels = models.filter(d => !d.is_open);
|
| 297 |
+
gPoints.selectAll('.point-circle')
|
| 298 |
+
.data(closedModels, d => d.name)
|
| 299 |
+
.join('circle')
|
| 300 |
+
.attr('class', 'point point-circle')
|
| 301 |
+
.attr('cx', d => xScale(d.avg_failed_guesses))
|
| 302 |
+
.attr('cy', d => yScale(d.avg_early_correct_turns))
|
| 303 |
+
.attr('r', pointRadius)
|
| 304 |
+
.attr('fill', d => d.color)
|
| 305 |
+
.attr('stroke', 'none')
|
| 306 |
+
.on('mouseenter', showTooltip)
|
| 307 |
+
.on('mousemove', showTooltip)
|
| 308 |
+
.on('mouseleave', hideTooltip);
|
| 309 |
+
|
| 310 |
+
// Open models as stars
|
| 311 |
+
const openModels = models.filter(d => d.is_open);
|
| 312 |
+
gPoints.selectAll('.point-star')
|
| 313 |
+
.data(openModels, d => d.name)
|
| 314 |
+
.join('path')
|
| 315 |
+
.attr('class', 'point point-star')
|
| 316 |
+
.attr('d', d => starPath(xScale(d.avg_failed_guesses), yScale(d.avg_early_correct_turns), pointRadius * 1.2, pointRadius * 0.5))
|
| 317 |
+
.attr('fill', d => d.color)
|
| 318 |
+
.attr('stroke', 'none')
|
| 319 |
+
.on('mouseenter', showTooltip)
|
| 320 |
+
.on('mousemove', showTooltip)
|
| 321 |
+
.on('mouseleave', hideTooltip);
|
| 322 |
+
|
| 323 |
+
// Point labels
|
| 324 |
+
gLabels.selectAll('.point-label')
|
| 325 |
+
.data(models)
|
| 326 |
+
.join('text')
|
| 327 |
+
.attr('class', 'point-label')
|
| 328 |
+
.attr('x', d => xScale(d.avg_failed_guesses) + pointRadius + 6)
|
| 329 |
+
.attr('y', d => yScale(d.avg_early_correct_turns) + 4)
|
| 330 |
+
.text(d => d.name);
|
| 331 |
+
}
|
| 332 |
+
|
| 333 |
+
// Initialize
|
| 334 |
+
fetch(DATA_URL, { cache: 'no-cache' })
|
| 335 |
+
.then(r => r.json())
|
| 336 |
+
.then(json => {
|
| 337 |
+
data = json;
|
| 338 |
+
render();
|
| 339 |
+
})
|
| 340 |
+
.catch(err => {
|
| 341 |
+
const pre = document.createElement('pre');
|
| 342 |
+
pre.style.color = 'red';
|
| 343 |
+
pre.style.padding = '16px';
|
| 344 |
+
pre.textContent = `Error loading data: ${err.message}`;
|
| 345 |
+
container.appendChild(pre);
|
| 346 |
+
});
|
| 347 |
+
|
| 348 |
+
// Resize handling
|
| 349 |
+
if (window.ResizeObserver) {
|
| 350 |
+
new ResizeObserver(() => render()).observe(container);
|
| 351 |
+
} else {
|
| 352 |
+
window.addEventListener('resize', render);
|
| 353 |
+
}
|
| 354 |
+
|
| 355 |
+
// Theme change handling
|
| 356 |
+
const observer = new MutationObserver(() => render());
|
| 357 |
+
observer.observe(document.documentElement, {
|
| 358 |
+
attributes: true,
|
| 359 |
+
attributeFilter: ['data-theme']
|
| 360 |
+
});
|
| 361 |
+
};
|
| 362 |
+
|
| 363 |
+
if (document.readyState === 'loading') {
|
| 364 |
+
document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
|
| 365 |
+
} else {
|
| 366 |
+
ensureD3(bootstrap);
|
| 367 |
+
}
|
| 368 |
+
})();
|
| 369 |
+
</script>
|
app/src/content/embeds/complexity-analysis.html
ADDED
|
@@ -0,0 +1,492 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div class="d3-complexity-analysis"></div>
|
| 2 |
+
<style>
|
| 3 |
+
.d3-complexity-analysis {
|
| 4 |
+
width: 100%;
|
| 5 |
+
margin: 10px 0;
|
| 6 |
+
position: relative;
|
| 7 |
+
font-family: system-ui, -apple-system, sans-serif;
|
| 8 |
+
}
|
| 9 |
+
|
| 10 |
+
.d3-complexity-analysis svg {
|
| 11 |
+
display: block;
|
| 12 |
+
width: 100%;
|
| 13 |
+
height: auto;
|
| 14 |
+
}
|
| 15 |
+
|
| 16 |
+
.d3-complexity-analysis .axes path,
|
| 17 |
+
.d3-complexity-analysis .axes line {
|
| 18 |
+
stroke: var(--axis-color, var(--text-color));
|
| 19 |
+
}
|
| 20 |
+
|
| 21 |
+
.d3-complexity-analysis .axes text {
|
| 22 |
+
fill: var(--tick-color, var(--muted-color));
|
| 23 |
+
font-size: 11px;
|
| 24 |
+
}
|
| 25 |
+
|
| 26 |
+
.d3-complexity-analysis .axes text.axis-label {
|
| 27 |
+
font-size: 14px;
|
| 28 |
+
font-weight: 500;
|
| 29 |
+
fill: var(--text-color);
|
| 30 |
+
}
|
| 31 |
+
|
| 32 |
+
.d3-complexity-analysis .axes text.chart-title {
|
| 33 |
+
font-size: 16px;
|
| 34 |
+
font-weight: 600;
|
| 35 |
+
fill: var(--text-color);
|
| 36 |
+
}
|
| 37 |
+
|
| 38 |
+
.d3-complexity-analysis .cell {
|
| 39 |
+
stroke: var(--surface-bg, #fff);
|
| 40 |
+
stroke-width: 2;
|
| 41 |
+
cursor: pointer;
|
| 42 |
+
transition: opacity 0.1s ease;
|
| 43 |
+
}
|
| 44 |
+
|
| 45 |
+
.d3-complexity-analysis .cell:hover {
|
| 46 |
+
opacity: 0.85;
|
| 47 |
+
}
|
| 48 |
+
|
| 49 |
+
.d3-complexity-analysis .cell-text {
|
| 50 |
+
font-size: 13px;
|
| 51 |
+
font-weight: 600;
|
| 52 |
+
pointer-events: none;
|
| 53 |
+
}
|
| 54 |
+
|
| 55 |
+
.d3-complexity-analysis .model-label {
|
| 56 |
+
font-size: 12px;
|
| 57 |
+
fill: var(--text-color);
|
| 58 |
+
}
|
| 59 |
+
|
| 60 |
+
.d3-complexity-analysis .quartile-label {
|
| 61 |
+
font-size: 12px;
|
| 62 |
+
fill: var(--text-color);
|
| 63 |
+
}
|
| 64 |
+
|
| 65 |
+
.d3-complexity-analysis .legend-title {
|
| 66 |
+
font-size: 11px;
|
| 67 |
+
fill: var(--muted-color);
|
| 68 |
+
}
|
| 69 |
+
|
| 70 |
+
.d3-complexity-analysis .legend-tick {
|
| 71 |
+
font-size: 10px;
|
| 72 |
+
fill: var(--muted-color);
|
| 73 |
+
}
|
| 74 |
+
|
| 75 |
+
.d3-complexity-analysis .d3-tooltip {
|
| 76 |
+
position: absolute;
|
| 77 |
+
top: 0;
|
| 78 |
+
left: 0;
|
| 79 |
+
transform: translate(-9999px, -9999px);
|
| 80 |
+
pointer-events: none;
|
| 81 |
+
padding: 10px 12px;
|
| 82 |
+
border-radius: 8px;
|
| 83 |
+
font-size: 12px;
|
| 84 |
+
line-height: 1.5;
|
| 85 |
+
border: 1px solid var(--border-color);
|
| 86 |
+
background: var(--surface-bg);
|
| 87 |
+
color: var(--text-color);
|
| 88 |
+
box-shadow: 0 4px 24px rgba(0,0,0,.18);
|
| 89 |
+
opacity: 0;
|
| 90 |
+
transition: opacity 0.12s ease;
|
| 91 |
+
z-index: 10;
|
| 92 |
+
max-width: 280px;
|
| 93 |
+
}
|
| 94 |
+
|
| 95 |
+
.d3-complexity-analysis .d3-tooltip .model-name {
|
| 96 |
+
font-weight: 600;
|
| 97 |
+
margin-bottom: 4px;
|
| 98 |
+
}
|
| 99 |
+
|
| 100 |
+
.d3-complexity-analysis .d3-tooltip .metric {
|
| 101 |
+
display: flex;
|
| 102 |
+
justify-content: space-between;
|
| 103 |
+
gap: 16px;
|
| 104 |
+
}
|
| 105 |
+
|
| 106 |
+
.d3-complexity-analysis .d3-tooltip .metric-label {
|
| 107 |
+
color: var(--muted-color);
|
| 108 |
+
}
|
| 109 |
+
|
| 110 |
+
.d3-complexity-analysis .d3-tooltip .metric-value {
|
| 111 |
+
font-weight: 500;
|
| 112 |
+
}
|
| 113 |
+
|
| 114 |
+
.d3-complexity-analysis .d3-tooltip .interpretation {
|
| 115 |
+
margin-top: 6px;
|
| 116 |
+
font-size: 11px;
|
| 117 |
+
color: var(--muted-color);
|
| 118 |
+
font-style: italic;
|
| 119 |
+
}
|
| 120 |
+
</style>
|
| 121 |
+
<script>
|
| 122 |
+
(() => {
|
| 123 |
+
const ensureD3 = (cb) => {
|
| 124 |
+
if (window.d3 && typeof window.d3.select === 'function') return cb();
|
| 125 |
+
let s = document.getElementById('d3-cdn-script');
|
| 126 |
+
if (!s) {
|
| 127 |
+
s = document.createElement('script');
|
| 128 |
+
s.id = 'd3-cdn-script';
|
| 129 |
+
s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
|
| 130 |
+
document.head.appendChild(s);
|
| 131 |
+
}
|
| 132 |
+
const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
|
| 133 |
+
s.addEventListener('load', onReady, { once: true });
|
| 134 |
+
if (window.d3) onReady();
|
| 135 |
+
};
|
| 136 |
+
|
| 137 |
+
const bootstrap = () => {
|
| 138 |
+
const scriptEl = document.currentScript;
|
| 139 |
+
let container = scriptEl ? scriptEl.previousElementSibling : null;
|
| 140 |
+
if (!(container && container.classList && container.classList.contains('d3-complexity-analysis'))) {
|
| 141 |
+
const candidates = Array.from(document.querySelectorAll('.d3-complexity-analysis'))
|
| 142 |
+
.filter((el) => !(el.dataset && el.dataset.mounted === 'true'));
|
| 143 |
+
container = candidates[candidates.length - 1] || null;
|
| 144 |
+
}
|
| 145 |
+
if (!container) return;
|
| 146 |
+
if (container.dataset) {
|
| 147 |
+
if (container.dataset.mounted === 'true') return;
|
| 148 |
+
container.dataset.mounted = 'true';
|
| 149 |
+
}
|
| 150 |
+
|
| 151 |
+
// Tooltip setup
|
| 152 |
+
container.style.position = container.style.position || 'relative';
|
| 153 |
+
const tip = document.createElement('div');
|
| 154 |
+
tip.className = 'd3-tooltip';
|
| 155 |
+
container.appendChild(tip);
|
| 156 |
+
|
| 157 |
+
// SVG setup
|
| 158 |
+
const svg = d3.select(container).append('svg');
|
| 159 |
+
const gRoot = svg.append('g');
|
| 160 |
+
|
| 161 |
+
// Chart groups
|
| 162 |
+
const gAxes = gRoot.append('g').attr('class', 'axes');
|
| 163 |
+
const gCells = gRoot.append('g').attr('class', 'cells');
|
| 164 |
+
const gLegend = gRoot.append('g').attr('class', 'legend');
|
| 165 |
+
|
| 166 |
+
// State
|
| 167 |
+
let data = null;
|
| 168 |
+
let width = 700;
|
| 169 |
+
let height = 450;
|
| 170 |
+
const margin = { top: 60, right: 100, bottom: 60, left: 160 };
|
| 171 |
+
|
| 172 |
+
// Scales
|
| 173 |
+
const xScale = d3.scaleBand();
|
| 174 |
+
const yScale = d3.scaleBand();
|
| 175 |
+
|
| 176 |
+
// Linear color scale: red (0%) -> green (100%+)
|
| 177 |
+
const colorScale = d3.scaleLinear()
|
| 178 |
+
.interpolate(() => d3.interpolateRdYlGn);
|
| 179 |
+
|
| 180 |
+
const DATA_URL = '/data/complexity_analysis.json';
|
| 181 |
+
|
| 182 |
+
function updateSize() {
|
| 183 |
+
width = Math.min(container.clientWidth || 700, 800);
|
| 184 |
+
const numModels = data ? data.models.length : 10;
|
| 185 |
+
const cellHeight = 36;
|
| 186 |
+
height = margin.top + margin.bottom + numModels * cellHeight;
|
| 187 |
+
svg.attr('width', width).attr('height', height).attr('viewBox', `0 0 ${width} ${height}`);
|
| 188 |
+
gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
|
| 189 |
+
return {
|
| 190 |
+
innerWidth: width - margin.left - margin.right,
|
| 191 |
+
innerHeight: height - margin.top - margin.bottom
|
| 192 |
+
};
|
| 193 |
+
}
|
| 194 |
+
|
| 195 |
+
function getContrastColor(hexColor) {
|
| 196 |
+
const hex = hexColor.replace('#', '');
|
| 197 |
+
const r = parseInt(hex.substr(0, 2), 16) / 255;
|
| 198 |
+
const g = parseInt(hex.substr(2, 2), 16) / 255;
|
| 199 |
+
const b = parseInt(hex.substr(4, 2), 16) / 255;
|
| 200 |
+
const luminance = 0.299 * r + 0.587 * g + 0.114 * b;
|
| 201 |
+
return luminance > 0.5 ? '#000000' : '#ffffff';
|
| 202 |
+
}
|
| 203 |
+
|
| 204 |
+
function rgbToHex(rgb) {
|
| 205 |
+
// Convert rgb(r, g, b) string to #rrggbb
|
| 206 |
+
const match = rgb.match(/rgb\((\d+),\s*(\d+),\s*(\d+)\)/);
|
| 207 |
+
if (!match) return rgb;
|
| 208 |
+
const r = parseInt(match[1]).toString(16).padStart(2, '0');
|
| 209 |
+
const g = parseInt(match[2]).toString(16).padStart(2, '0');
|
| 210 |
+
const b = parseInt(match[3]).toString(16).padStart(2, '0');
|
| 211 |
+
return `#${r}${g}${b}`;
|
| 212 |
+
}
|
| 213 |
+
|
| 214 |
+
function showTooltip(event, d) {
|
| 215 |
+
const rect = container.getBoundingClientRect();
|
| 216 |
+
const x = event.clientX - rect.left;
|
| 217 |
+
const y = event.clientY - rect.top;
|
| 218 |
+
|
| 219 |
+
const pct = d.score * 100;
|
| 220 |
+
const interpretation = pct > 100
|
| 221 |
+
? `Performs ${(pct - 100).toFixed(0)}% above average on ${d.quartile} rules`
|
| 222 |
+
: pct < 100
|
| 223 |
+
? `Performs ${(100 - pct).toFixed(0)}% below average on ${d.quartile} rules`
|
| 224 |
+
: 'Performs at average on these rules';
|
| 225 |
+
|
| 226 |
+
const quartileDesc = {
|
| 227 |
+
'Q1': 'Easiest (lowest complexity)',
|
| 228 |
+
'Q2': 'Easy-Medium',
|
| 229 |
+
'Q3': 'Medium-Hard',
|
| 230 |
+
'Q4': 'Hardest (highest complexity)'
|
| 231 |
+
};
|
| 232 |
+
|
| 233 |
+
tip.innerHTML = `
|
| 234 |
+
<div class="model-name">${d.model}</div>
|
| 235 |
+
<div class="metric">
|
| 236 |
+
<span class="metric-label">Quartile:</span>
|
| 237 |
+
<span class="metric-value">${d.quartile}</span>
|
| 238 |
+
</div>
|
| 239 |
+
<div class="metric">
|
| 240 |
+
<span class="metric-label">Difficulty:</span>
|
| 241 |
+
<span class="metric-value">${quartileDesc[d.quartile]}</span>
|
| 242 |
+
</div>
|
| 243 |
+
<div class="metric">
|
| 244 |
+
<span class="metric-label">Relative Score:</span>
|
| 245 |
+
<span class="metric-value">${pct.toFixed(0)}%</span>
|
| 246 |
+
</div>
|
| 247 |
+
<div class="interpretation">${interpretation}</div>
|
| 248 |
+
`;
|
| 249 |
+
|
| 250 |
+
const tipWidth = tip.offsetWidth || 200;
|
| 251 |
+
const tipHeight = tip.offsetHeight || 120;
|
| 252 |
+
let tipX = x + 12;
|
| 253 |
+
let tipY = y - tipHeight / 2;
|
| 254 |
+
|
| 255 |
+
if (tipX + tipWidth > width) tipX = x - tipWidth - 12;
|
| 256 |
+
if (tipY < 0) tipY = 8;
|
| 257 |
+
if (tipY + tipHeight > height) tipY = height - tipHeight - 8;
|
| 258 |
+
|
| 259 |
+
tip.style.transform = `translate(${tipX}px, ${tipY}px)`;
|
| 260 |
+
tip.style.opacity = '1';
|
| 261 |
+
}
|
| 262 |
+
|
| 263 |
+
function hideTooltip() {
|
| 264 |
+
tip.style.opacity = '0';
|
| 265 |
+
tip.style.transform = 'translate(-9999px, -9999px)';
|
| 266 |
+
}
|
| 267 |
+
|
| 268 |
+
function render() {
|
| 269 |
+
if (!data) return;
|
| 270 |
+
|
| 271 |
+
const { innerWidth, innerHeight } = updateSize();
|
| 272 |
+
const quartiles = data.quartiles;
|
| 273 |
+
const models = data.models;
|
| 274 |
+
|
| 275 |
+
// Update scales
|
| 276 |
+
xScale
|
| 277 |
+
.domain(quartiles)
|
| 278 |
+
.range([0, innerWidth])
|
| 279 |
+
.padding(0.08);
|
| 280 |
+
|
| 281 |
+
yScale
|
| 282 |
+
.domain(models.map(m => m.name))
|
| 283 |
+
.range([0, innerHeight])
|
| 284 |
+
.padding(0.08);
|
| 285 |
+
|
| 286 |
+
// Find score extent for color scale (in percentage: 0-100%+)
|
| 287 |
+
const allScores = [];
|
| 288 |
+
models.forEach(m => {
|
| 289 |
+
quartiles.forEach(q => {
|
| 290 |
+
allScores.push(m.quartile_scores[q] * 100);
|
| 291 |
+
});
|
| 292 |
+
});
|
| 293 |
+
const minPct = Math.min(...allScores);
|
| 294 |
+
const maxPct = Math.max(...allScores);
|
| 295 |
+
// Linear scale from 0% (red) to 100%+ (green)
|
| 296 |
+
colorScale.domain([0, maxPct]);
|
| 297 |
+
|
| 298 |
+
// Build cell data (with percentage values)
|
| 299 |
+
const cellData = [];
|
| 300 |
+
models.forEach(m => {
|
| 301 |
+
quartiles.forEach(q => {
|
| 302 |
+
cellData.push({
|
| 303 |
+
model: m.name,
|
| 304 |
+
quartile: q,
|
| 305 |
+
score: m.quartile_scores[q],
|
| 306 |
+
pct: m.quartile_scores[q] * 100
|
| 307 |
+
});
|
| 308 |
+
});
|
| 309 |
+
});
|
| 310 |
+
|
| 311 |
+
// Draw cells
|
| 312 |
+
gCells.selectAll('.cell')
|
| 313 |
+
.data(cellData, d => `${d.model}-${d.quartile}`)
|
| 314 |
+
.join('rect')
|
| 315 |
+
.attr('class', 'cell')
|
| 316 |
+
.attr('x', d => xScale(d.quartile))
|
| 317 |
+
.attr('y', d => yScale(d.model))
|
| 318 |
+
.attr('width', xScale.bandwidth())
|
| 319 |
+
.attr('height', yScale.bandwidth())
|
| 320 |
+
.attr('fill', d => colorScale(d.pct))
|
| 321 |
+
.attr('rx', 4)
|
| 322 |
+
.on('mouseenter', showTooltip)
|
| 323 |
+
.on('mousemove', showTooltip)
|
| 324 |
+
.on('mouseleave', hideTooltip);
|
| 325 |
+
|
| 326 |
+
// Draw cell text
|
| 327 |
+
gCells.selectAll('.cell-text')
|
| 328 |
+
.data(cellData, d => `${d.model}-${d.quartile}`)
|
| 329 |
+
.join('text')
|
| 330 |
+
.attr('class', 'cell-text')
|
| 331 |
+
.attr('x', d => xScale(d.quartile) + xScale.bandwidth() / 2)
|
| 332 |
+
.attr('y', d => yScale(d.model) + yScale.bandwidth() / 2)
|
| 333 |
+
.attr('text-anchor', 'middle')
|
| 334 |
+
.attr('dominant-baseline', 'central')
|
| 335 |
+
.style('fill', d => {
|
| 336 |
+
const bgColor = colorScale(d.pct);
|
| 337 |
+
const hex = bgColor.startsWith('rgb') ? rgbToHex(bgColor) : bgColor;
|
| 338 |
+
return getContrastColor(hex);
|
| 339 |
+
})
|
| 340 |
+
.text(d => `${d.pct.toFixed(0)}%`);
|
| 341 |
+
|
| 342 |
+
// Model labels (Y-axis)
|
| 343 |
+
gAxes.selectAll('.model-label')
|
| 344 |
+
.data(models, d => d.name)
|
| 345 |
+
.join('text')
|
| 346 |
+
.attr('class', 'model-label')
|
| 347 |
+
.attr('x', -10)
|
| 348 |
+
.attr('y', d => yScale(d.name) + yScale.bandwidth() / 2)
|
| 349 |
+
.attr('text-anchor', 'end')
|
| 350 |
+
.attr('dominant-baseline', 'central')
|
| 351 |
+
.text(d => d.name);
|
| 352 |
+
|
| 353 |
+
// Quartile labels (X-axis)
|
| 354 |
+
gAxes.selectAll('.quartile-label')
|
| 355 |
+
.data(quartiles)
|
| 356 |
+
.join('text')
|
| 357 |
+
.attr('class', 'quartile-label')
|
| 358 |
+
.attr('x', d => xScale(d) + xScale.bandwidth() / 2)
|
| 359 |
+
.attr('y', -10)
|
| 360 |
+
.attr('text-anchor', 'middle')
|
| 361 |
+
.text(d => d);
|
| 362 |
+
|
| 363 |
+
// X-axis title
|
| 364 |
+
gAxes.selectAll('.x-title')
|
| 365 |
+
.data([0])
|
| 366 |
+
.join('text')
|
| 367 |
+
.attr('class', 'x-title axis-label')
|
| 368 |
+
.attr('x', innerWidth / 2)
|
| 369 |
+
.attr('y', innerHeight + 40)
|
| 370 |
+
.attr('text-anchor', 'middle')
|
| 371 |
+
.text('Complexity Quartile (Q1 = easiest)');
|
| 372 |
+
|
| 373 |
+
// Chart title
|
| 374 |
+
gAxes.selectAll('.chart-title')
|
| 375 |
+
.data([0])
|
| 376 |
+
.join('text')
|
| 377 |
+
.attr('class', 'chart-title')
|
| 378 |
+
.attr('x', innerWidth / 2)
|
| 379 |
+
.attr('y', -35)
|
| 380 |
+
.attr('text-anchor', 'middle')
|
| 381 |
+
.text('Model Performance by Rule Complexity');
|
| 382 |
+
|
| 383 |
+
// Legend
|
| 384 |
+
const legendWidth = 20;
|
| 385 |
+
const legendHeight = innerHeight * 0.6;
|
| 386 |
+
const legendX = innerWidth + 30;
|
| 387 |
+
const legendY = (innerHeight - legendHeight) / 2;
|
| 388 |
+
|
| 389 |
+
// Create gradient
|
| 390 |
+
const gradientId = 'complexity-legend-gradient';
|
| 391 |
+
let defs = svg.select('defs');
|
| 392 |
+
if (defs.empty()) {
|
| 393 |
+
defs = svg.append('defs');
|
| 394 |
+
}
|
| 395 |
+
|
| 396 |
+
defs.selectAll(`#${gradientId}`).remove();
|
| 397 |
+
const gradient = defs.append('linearGradient')
|
| 398 |
+
.attr('id', gradientId)
|
| 399 |
+
.attr('x1', '0%')
|
| 400 |
+
.attr('x2', '0%')
|
| 401 |
+
.attr('y1', '100%')
|
| 402 |
+
.attr('y2', '0%');
|
| 403 |
+
|
| 404 |
+
const numStops = 11;
|
| 405 |
+
for (let i = 0; i <= numStops; i++) {
|
| 406 |
+
const t = i / numStops;
|
| 407 |
+
const value = t * maxPct;
|
| 408 |
+
gradient.append('stop')
|
| 409 |
+
.attr('offset', `${t * 100}%`)
|
| 410 |
+
.attr('stop-color', colorScale(value));
|
| 411 |
+
}
|
| 412 |
+
|
| 413 |
+
// Legend rectangle
|
| 414 |
+
gLegend.selectAll('.legend-rect')
|
| 415 |
+
.data([0])
|
| 416 |
+
.join('rect')
|
| 417 |
+
.attr('class', 'legend-rect')
|
| 418 |
+
.attr('x', legendX)
|
| 419 |
+
.attr('y', legendY)
|
| 420 |
+
.attr('width', legendWidth)
|
| 421 |
+
.attr('height', legendHeight)
|
| 422 |
+
.attr('fill', `url(#${gradientId})`)
|
| 423 |
+
.attr('rx', 2)
|
| 424 |
+
.attr('stroke', 'var(--border-color)')
|
| 425 |
+
.attr('stroke-width', 0.5);
|
| 426 |
+
|
| 427 |
+
// Legend ticks (in percentage)
|
| 428 |
+
const legendScale = d3.scaleLinear()
|
| 429 |
+
.domain([0, maxPct])
|
| 430 |
+
.range([legendY + legendHeight, legendY]);
|
| 431 |
+
|
| 432 |
+
// Generate nice tick values for percentage scale
|
| 433 |
+
const tickValues = [0, 50, 100];
|
| 434 |
+
if (maxPct > 100) tickValues.push(Math.round(maxPct / 10) * 10);
|
| 435 |
+
|
| 436 |
+
gLegend.selectAll('.legend-tick')
|
| 437 |
+
.data(tickValues.filter(v => v <= maxPct))
|
| 438 |
+
.join('text')
|
| 439 |
+
.attr('class', 'legend-tick')
|
| 440 |
+
.attr('x', legendX + legendWidth + 6)
|
| 441 |
+
.attr('y', d => legendScale(d))
|
| 442 |
+
.attr('dominant-baseline', 'middle')
|
| 443 |
+
.text(d => `${d}%`);
|
| 444 |
+
|
| 445 |
+
// Legend title
|
| 446 |
+
gLegend.selectAll('.legend-title')
|
| 447 |
+
.data([0])
|
| 448 |
+
.join('text')
|
| 449 |
+
.attr('class', 'legend-title')
|
| 450 |
+
.attr('x', legendX + legendWidth / 2)
|
| 451 |
+
.attr('y', legendY - 12)
|
| 452 |
+
.attr('text-anchor', 'middle')
|
| 453 |
+
.text('Relative Score');
|
| 454 |
+
}
|
| 455 |
+
|
| 456 |
+
// Initialize
|
| 457 |
+
fetch(DATA_URL, { cache: 'no-cache' })
|
| 458 |
+
.then(r => r.json())
|
| 459 |
+
.then(json => {
|
| 460 |
+
data = json;
|
| 461 |
+
render();
|
| 462 |
+
})
|
| 463 |
+
.catch(err => {
|
| 464 |
+
const pre = document.createElement('pre');
|
| 465 |
+
pre.style.color = 'red';
|
| 466 |
+
pre.style.padding = '16px';
|
| 467 |
+
pre.textContent = `Error loading data: ${err.message}`;
|
| 468 |
+
container.appendChild(pre);
|
| 469 |
+
});
|
| 470 |
+
|
| 471 |
+
// Resize handling
|
| 472 |
+
if (window.ResizeObserver) {
|
| 473 |
+
new ResizeObserver(() => render()).observe(container);
|
| 474 |
+
} else {
|
| 475 |
+
window.addEventListener('resize', render);
|
| 476 |
+
}
|
| 477 |
+
|
| 478 |
+
// Theme change handling
|
| 479 |
+
const observer = new MutationObserver(() => render());
|
| 480 |
+
observer.observe(document.documentElement, {
|
| 481 |
+
attributes: true,
|
| 482 |
+
attributeFilter: ['data-theme']
|
| 483 |
+
});
|
| 484 |
+
};
|
| 485 |
+
|
| 486 |
+
if (document.readyState === 'loading') {
|
| 487 |
+
document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
|
| 488 |
+
} else {
|
| 489 |
+
ensureD3(bootstrap);
|
| 490 |
+
}
|
| 491 |
+
})();
|
| 492 |
+
</script>
|
app/src/content/embeds/confidence-distribution.html
ADDED
|
@@ -0,0 +1,495 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div class="d3-confidence-distribution"></div>
|
| 2 |
+
<style>
|
| 3 |
+
.d3-confidence-distribution {
|
| 4 |
+
width: 100%;
|
| 5 |
+
margin: 10px 0;
|
| 6 |
+
position: relative;
|
| 7 |
+
font-family: system-ui, -apple-system, sans-serif;
|
| 8 |
+
}
|
| 9 |
+
|
| 10 |
+
.d3-confidence-distribution svg {
|
| 11 |
+
display: block;
|
| 12 |
+
width: 100%;
|
| 13 |
+
height: auto;
|
| 14 |
+
}
|
| 15 |
+
|
| 16 |
+
.d3-confidence-distribution .axes path,
|
| 17 |
+
.d3-confidence-distribution .axes line {
|
| 18 |
+
stroke: var(--axis-color, var(--text-color));
|
| 19 |
+
}
|
| 20 |
+
|
| 21 |
+
.d3-confidence-distribution .axes text {
|
| 22 |
+
fill: var(--tick-color, var(--muted-color));
|
| 23 |
+
font-size: 11px;
|
| 24 |
+
}
|
| 25 |
+
|
| 26 |
+
.d3-confidence-distribution .grid line {
|
| 27 |
+
stroke: var(--grid-color, rgba(0,0,0,.08));
|
| 28 |
+
}
|
| 29 |
+
|
| 30 |
+
.d3-confidence-distribution .axes text.axis-label {
|
| 31 |
+
font-size: 14px;
|
| 32 |
+
font-weight: 500;
|
| 33 |
+
fill: var(--text-color);
|
| 34 |
+
}
|
| 35 |
+
|
| 36 |
+
.d3-confidence-distribution .x-axis text {
|
| 37 |
+
transform: translateY(4px);
|
| 38 |
+
}
|
| 39 |
+
|
| 40 |
+
.d3-confidence-distribution .distribution-line {
|
| 41 |
+
fill: none;
|
| 42 |
+
stroke-width: 1.5;
|
| 43 |
+
}
|
| 44 |
+
|
| 45 |
+
.d3-confidence-distribution .data-point {
|
| 46 |
+
cursor: pointer;
|
| 47 |
+
transition: opacity 0.15s ease;
|
| 48 |
+
}
|
| 49 |
+
|
| 50 |
+
.d3-confidence-distribution .data-point:hover {
|
| 51 |
+
opacity: 0.8;
|
| 52 |
+
}
|
| 53 |
+
|
| 54 |
+
.d3-confidence-distribution .legend {
|
| 55 |
+
font-size: 11px;
|
| 56 |
+
}
|
| 57 |
+
|
| 58 |
+
.d3-confidence-distribution .legend-item {
|
| 59 |
+
cursor: pointer;
|
| 60 |
+
}
|
| 61 |
+
|
| 62 |
+
.d3-confidence-distribution .legend-item.dimmed .legend-line,
|
| 63 |
+
.d3-confidence-distribution .legend-item.dimmed .legend-marker {
|
| 64 |
+
opacity: 0.3;
|
| 65 |
+
}
|
| 66 |
+
|
| 67 |
+
.d3-confidence-distribution .legend-item.dimmed text {
|
| 68 |
+
opacity: 0.4;
|
| 69 |
+
}
|
| 70 |
+
|
| 71 |
+
.d3-confidence-distribution .legend-text {
|
| 72 |
+
fill: var(--text-color);
|
| 73 |
+
}
|
| 74 |
+
|
| 75 |
+
.d3-confidence-distribution .d3-tooltip {
|
| 76 |
+
position: absolute;
|
| 77 |
+
top: 0;
|
| 78 |
+
left: 0;
|
| 79 |
+
transform: translate(-9999px, -9999px);
|
| 80 |
+
pointer-events: none;
|
| 81 |
+
padding: 10px 12px;
|
| 82 |
+
border-radius: 8px;
|
| 83 |
+
font-size: 12px;
|
| 84 |
+
line-height: 1.4;
|
| 85 |
+
border: 1px solid var(--border-color);
|
| 86 |
+
background: var(--surface-bg);
|
| 87 |
+
color: var(--text-color);
|
| 88 |
+
box-shadow: 0 4px 24px rgba(0,0,0,.18);
|
| 89 |
+
opacity: 0;
|
| 90 |
+
transition: opacity 0.12s ease;
|
| 91 |
+
z-index: 10;
|
| 92 |
+
}
|
| 93 |
+
|
| 94 |
+
.d3-confidence-distribution .d3-tooltip .model-name {
|
| 95 |
+
font-weight: 600;
|
| 96 |
+
margin-bottom: 4px;
|
| 97 |
+
}
|
| 98 |
+
|
| 99 |
+
.d3-confidence-distribution .d3-tooltip .metric {
|
| 100 |
+
display: flex;
|
| 101 |
+
justify-content: space-between;
|
| 102 |
+
gap: 16px;
|
| 103 |
+
}
|
| 104 |
+
|
| 105 |
+
.d3-confidence-distribution .d3-tooltip .metric-label {
|
| 106 |
+
color: var(--muted-color);
|
| 107 |
+
}
|
| 108 |
+
|
| 109 |
+
.d3-confidence-distribution .d3-tooltip .metric-value {
|
| 110 |
+
font-weight: 500;
|
| 111 |
+
}
|
| 112 |
+
</style>
|
| 113 |
+
<script>
|
| 114 |
+
(() => {
|
| 115 |
+
const ensureD3 = (cb) => {
|
| 116 |
+
if (window.d3 && typeof window.d3.select === 'function') return cb();
|
| 117 |
+
let s = document.getElementById('d3-cdn-script');
|
| 118 |
+
if (!s) {
|
| 119 |
+
s = document.createElement('script');
|
| 120 |
+
s.id = 'd3-cdn-script';
|
| 121 |
+
s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
|
| 122 |
+
document.head.appendChild(s);
|
| 123 |
+
}
|
| 124 |
+
const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
|
| 125 |
+
s.addEventListener('load', onReady, { once: true });
|
| 126 |
+
if (window.d3) onReady();
|
| 127 |
+
};
|
| 128 |
+
|
| 129 |
+
const bootstrap = () => {
|
| 130 |
+
const scriptEl = document.currentScript;
|
| 131 |
+
let container = scriptEl ? scriptEl.previousElementSibling : null;
|
| 132 |
+
if (!(container && container.classList && container.classList.contains('d3-confidence-distribution'))) {
|
| 133 |
+
const candidates = Array.from(document.querySelectorAll('.d3-confidence-distribution'))
|
| 134 |
+
.filter((el) => !(el.dataset && el.dataset.mounted === 'true'));
|
| 135 |
+
container = candidates[candidates.length - 1] || null;
|
| 136 |
+
}
|
| 137 |
+
if (!container) return;
|
| 138 |
+
if (container.dataset) {
|
| 139 |
+
if (container.dataset.mounted === 'true') return;
|
| 140 |
+
container.dataset.mounted = 'true';
|
| 141 |
+
}
|
| 142 |
+
|
| 143 |
+
// Tooltip setup
|
| 144 |
+
container.style.position = container.style.position || 'relative';
|
| 145 |
+
const tip = document.createElement('div');
|
| 146 |
+
tip.className = 'd3-tooltip';
|
| 147 |
+
container.appendChild(tip);
|
| 148 |
+
|
| 149 |
+
// SVG setup
|
| 150 |
+
const svg = d3.select(container).append('svg');
|
| 151 |
+
const gRoot = svg.append('g');
|
| 152 |
+
|
| 153 |
+
// Chart groups (order matters for layering)
|
| 154 |
+
const gGrid = gRoot.append('g').attr('class', 'grid');
|
| 155 |
+
const gLines = gRoot.append('g').attr('class', 'lines');
|
| 156 |
+
const gPoints = gRoot.append('g').attr('class', 'points');
|
| 157 |
+
const gAxes = gRoot.append('g').attr('class', 'axes');
|
| 158 |
+
const gLegend = gRoot.append('g').attr('class', 'legend');
|
| 159 |
+
|
| 160 |
+
// State
|
| 161 |
+
let data = null;
|
| 162 |
+
let width = 800;
|
| 163 |
+
let height = 500;
|
| 164 |
+
const margin = { top: 20, right: 180, bottom: 56, left: 72 };
|
| 165 |
+
let hiddenModels = new Set();
|
| 166 |
+
|
| 167 |
+
// Scales
|
| 168 |
+
const xScale = d3.scaleLinear();
|
| 169 |
+
const yScale = d3.scaleLinear();
|
| 170 |
+
|
| 171 |
+
// Line generator
|
| 172 |
+
const line = d3.line()
|
| 173 |
+
.x(d => xScale(d.confidence_level))
|
| 174 |
+
.y(d => yScale(d.proportion));
|
| 175 |
+
|
| 176 |
+
// Data loading
|
| 177 |
+
const DATA_URL = '/data/confidence_distribution.json';
|
| 178 |
+
|
| 179 |
+
function updateSize() {
|
| 180 |
+
width = container.clientWidth || 800;
|
| 181 |
+
const availableWidth = width - margin.left - margin.right;
|
| 182 |
+
const maxHeight = Math.round(width * 0.7);
|
| 183 |
+
const innerSize = Math.min(availableWidth, maxHeight - margin.top - margin.bottom);
|
| 184 |
+
height = innerSize + margin.top + margin.bottom;
|
| 185 |
+
svg.attr('width', width).attr('height', height).attr('viewBox', `0 0 ${width} ${height}`);
|
| 186 |
+
gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
|
| 187 |
+
return {
|
| 188 |
+
innerWidth: width - margin.left - margin.right,
|
| 189 |
+
innerHeight: height - margin.top - margin.bottom
|
| 190 |
+
};
|
| 191 |
+
}
|
| 192 |
+
|
| 193 |
+
function showTooltip(event, d, model) {
|
| 194 |
+
const rect = container.getBoundingClientRect();
|
| 195 |
+
const x = event.clientX - rect.left;
|
| 196 |
+
const y = event.clientY - rect.top;
|
| 197 |
+
|
| 198 |
+
tip.innerHTML = `
|
| 199 |
+
<div class="model-name" style="color: ${model.color}">${model.name}</div>
|
| 200 |
+
<div class="metric">
|
| 201 |
+
<span class="metric-label">Confidence level:</span>
|
| 202 |
+
<span class="metric-value">${d.confidence_level * 10}%</span>
|
| 203 |
+
</div>
|
| 204 |
+
<div class="metric">
|
| 205 |
+
<span class="metric-label">Proportion:</span>
|
| 206 |
+
<span class="metric-value">${(d.proportion * 100).toFixed(1)}%</span>
|
| 207 |
+
</div>
|
| 208 |
+
<div class="metric">
|
| 209 |
+
<span class="metric-label">Count:</span>
|
| 210 |
+
<span class="metric-value">${d.count} / ${model.total_guesses}</span>
|
| 211 |
+
</div>
|
| 212 |
+
`;
|
| 213 |
+
|
| 214 |
+
const tipWidth = tip.offsetWidth || 150;
|
| 215 |
+
const tipHeight = tip.offsetHeight || 100;
|
| 216 |
+
let tipX = x + 12;
|
| 217 |
+
let tipY = y - tipHeight / 2;
|
| 218 |
+
|
| 219 |
+
if (tipX + tipWidth > width) tipX = x - tipWidth - 12;
|
| 220 |
+
if (tipY < 0) tipY = 8;
|
| 221 |
+
if (tipY + tipHeight > height) tipY = height - tipHeight - 8;
|
| 222 |
+
|
| 223 |
+
tip.style.transform = `translate(${tipX}px, ${tipY}px)`;
|
| 224 |
+
tip.style.opacity = '1';
|
| 225 |
+
}
|
| 226 |
+
|
| 227 |
+
function hideTooltip() {
|
| 228 |
+
tip.style.opacity = '0';
|
| 229 |
+
tip.style.transform = 'translate(-9999px, -9999px)';
|
| 230 |
+
}
|
| 231 |
+
|
| 232 |
+
function toggleModel(modelName) {
|
| 233 |
+
if (hiddenModels.has(modelName)) {
|
| 234 |
+
hiddenModels.delete(modelName);
|
| 235 |
+
} else {
|
| 236 |
+
hiddenModels.add(modelName);
|
| 237 |
+
}
|
| 238 |
+
render();
|
| 239 |
+
}
|
| 240 |
+
|
| 241 |
+
function render() {
|
| 242 |
+
if (!data) return;
|
| 243 |
+
|
| 244 |
+
const { innerWidth, innerHeight } = updateSize();
|
| 245 |
+
const models = data.models;
|
| 246 |
+
const visibleModels = models.filter(m => !hiddenModels.has(m.name));
|
| 247 |
+
|
| 248 |
+
// X scale: confidence levels 5-10
|
| 249 |
+
xScale
|
| 250 |
+
.domain([5, 10])
|
| 251 |
+
.range([0, innerWidth]);
|
| 252 |
+
|
| 253 |
+
// Y scale: proportion (0 to max + padding)
|
| 254 |
+
const maxProportion = d3.max(visibleModels, m =>
|
| 255 |
+
d3.max(m.distribution, d => d.proportion)
|
| 256 |
+
) || 0.8;
|
| 257 |
+
yScale
|
| 258 |
+
.domain([0, Math.min(1, maxProportion * 1.1)])
|
| 259 |
+
.range([innerHeight, 0])
|
| 260 |
+
.nice();
|
| 261 |
+
|
| 262 |
+
// Grid lines
|
| 263 |
+
const xTicks = [5, 6, 7, 8, 9, 10];
|
| 264 |
+
const yTicks = yScale.ticks(6);
|
| 265 |
+
|
| 266 |
+
gGrid.selectAll('.grid-x')
|
| 267 |
+
.data(xTicks)
|
| 268 |
+
.join('line')
|
| 269 |
+
.attr('class', 'grid-x')
|
| 270 |
+
.attr('x1', d => xScale(d))
|
| 271 |
+
.attr('x2', d => xScale(d))
|
| 272 |
+
.attr('y1', 0)
|
| 273 |
+
.attr('y2', innerHeight);
|
| 274 |
+
|
| 275 |
+
gGrid.selectAll('.grid-y')
|
| 276 |
+
.data(yTicks)
|
| 277 |
+
.join('line')
|
| 278 |
+
.attr('class', 'grid-y')
|
| 279 |
+
.attr('x1', 0)
|
| 280 |
+
.attr('x2', innerWidth)
|
| 281 |
+
.attr('y1', d => yScale(d))
|
| 282 |
+
.attr('y2', d => yScale(d));
|
| 283 |
+
|
| 284 |
+
// Axes
|
| 285 |
+
const tickSize = 6;
|
| 286 |
+
const percentFormat = d => `${Math.round(d * 100)}%`;
|
| 287 |
+
|
| 288 |
+
gAxes.selectAll('.x-axis')
|
| 289 |
+
.data([0])
|
| 290 |
+
.join('g')
|
| 291 |
+
.attr('class', 'x-axis')
|
| 292 |
+
.attr('transform', `translate(0,${innerHeight})`)
|
| 293 |
+
.call(d3.axisBottom(xScale)
|
| 294 |
+
.tickValues(xTicks)
|
| 295 |
+
.tickFormat(d => d)
|
| 296 |
+
.tickSizeInner(-tickSize)
|
| 297 |
+
.tickSizeOuter(0));
|
| 298 |
+
|
| 299 |
+
gAxes.selectAll('.y-axis')
|
| 300 |
+
.data([0])
|
| 301 |
+
.join('g')
|
| 302 |
+
.attr('class', 'y-axis')
|
| 303 |
+
.call(d3.axisLeft(yScale)
|
| 304 |
+
.ticks(6)
|
| 305 |
+
.tickFormat(percentFormat)
|
| 306 |
+
.tickSizeInner(-tickSize)
|
| 307 |
+
.tickSizeOuter(0));
|
| 308 |
+
|
| 309 |
+
// Axis labels
|
| 310 |
+
gAxes.selectAll('.x-label')
|
| 311 |
+
.data([0])
|
| 312 |
+
.join('text')
|
| 313 |
+
.attr('class', 'x-label axis-label')
|
| 314 |
+
.attr('x', innerWidth / 2)
|
| 315 |
+
.attr('y', innerHeight + 44)
|
| 316 |
+
.attr('text-anchor', 'middle')
|
| 317 |
+
.text('Confidence Level');
|
| 318 |
+
|
| 319 |
+
gAxes.selectAll('.y-label')
|
| 320 |
+
.data([0])
|
| 321 |
+
.join('text')
|
| 322 |
+
.attr('class', 'y-label axis-label')
|
| 323 |
+
.attr('x', -innerHeight / 2)
|
| 324 |
+
.attr('y', -52)
|
| 325 |
+
.attr('text-anchor', 'middle')
|
| 326 |
+
.attr('transform', 'rotate(-90)')
|
| 327 |
+
.text('Proportion of Guesses');
|
| 328 |
+
|
| 329 |
+
// Lines for each model
|
| 330 |
+
gLines.selectAll('.distribution-line')
|
| 331 |
+
.data(visibleModels, d => d.name)
|
| 332 |
+
.join('path')
|
| 333 |
+
.attr('class', 'distribution-line')
|
| 334 |
+
.attr('d', d => line(d.distribution))
|
| 335 |
+
.attr('stroke', d => d.color);
|
| 336 |
+
|
| 337 |
+
// Data points - circles for closed models, stars for open models
|
| 338 |
+
const allPoints = visibleModels.flatMap(model =>
|
| 339 |
+
model.distribution.map(p => ({ ...p, model }))
|
| 340 |
+
);
|
| 341 |
+
const closedPoints = allPoints.filter(d => !d.model.is_open);
|
| 342 |
+
const openPoints = allPoints.filter(d => d.model.is_open);
|
| 343 |
+
|
| 344 |
+
// Helper function to create a 5-point star path
|
| 345 |
+
const starPath = (cx, cy, outerR, innerR) => {
|
| 346 |
+
const points = [];
|
| 347 |
+
for (let i = 0; i < 10; i++) {
|
| 348 |
+
const r = i % 2 === 0 ? outerR : innerR;
|
| 349 |
+
const angle = (Math.PI / 2) + (i * Math.PI / 5);
|
| 350 |
+
points.push([cx + r * Math.cos(angle), cy - r * Math.sin(angle)]);
|
| 351 |
+
}
|
| 352 |
+
return 'M' + points.map(p => p.join(',')).join('L') + 'Z';
|
| 353 |
+
};
|
| 354 |
+
|
| 355 |
+
// Circles for closed models
|
| 356 |
+
gPoints.selectAll('.data-point-circle')
|
| 357 |
+
.data(closedPoints, d => `${d.model.name}-${d.confidence_level}`)
|
| 358 |
+
.join('circle')
|
| 359 |
+
.attr('class', 'data-point data-point-circle')
|
| 360 |
+
.attr('cx', d => xScale(d.confidence_level))
|
| 361 |
+
.attr('cy', d => yScale(d.proportion))
|
| 362 |
+
.attr('r', 4)
|
| 363 |
+
.attr('fill', d => d.model.color)
|
| 364 |
+
.attr('stroke', 'var(--surface-bg, white)')
|
| 365 |
+
.attr('stroke-width', 1)
|
| 366 |
+
.on('mouseenter', (event, d) => showTooltip(event, d, d.model))
|
| 367 |
+
.on('mousemove', (event, d) => showTooltip(event, d, d.model))
|
| 368 |
+
.on('mouseleave', hideTooltip);
|
| 369 |
+
|
| 370 |
+
// Stars for open models
|
| 371 |
+
gPoints.selectAll('.data-point-star')
|
| 372 |
+
.data(openPoints, d => `${d.model.name}-${d.confidence_level}`)
|
| 373 |
+
.join('path')
|
| 374 |
+
.attr('class', 'data-point data-point-star')
|
| 375 |
+
.attr('d', d => starPath(
|
| 376 |
+
xScale(d.confidence_level),
|
| 377 |
+
yScale(d.proportion),
|
| 378 |
+
6, 2.6
|
| 379 |
+
))
|
| 380 |
+
.attr('fill', d => d.model.color)
|
| 381 |
+
.attr('stroke', 'var(--surface-bg, white)')
|
| 382 |
+
.attr('stroke-width', 0.8)
|
| 383 |
+
.on('mouseenter', (event, d) => showTooltip(event, d, d.model))
|
| 384 |
+
.on('mousemove', (event, d) => showTooltip(event, d, d.model))
|
| 385 |
+
.on('mouseleave', hideTooltip);
|
| 386 |
+
|
| 387 |
+
// Legend
|
| 388 |
+
const legendX = innerWidth + 16;
|
| 389 |
+
const legendItemHeight = 20;
|
| 390 |
+
|
| 391 |
+
// Helper function for legend star
|
| 392 |
+
const legendStarPath = (cx, cy, outerR, innerR) => {
|
| 393 |
+
const points = [];
|
| 394 |
+
for (let i = 0; i < 10; i++) {
|
| 395 |
+
const r = i % 2 === 0 ? outerR : innerR;
|
| 396 |
+
const angle = (Math.PI / 2) + (i * Math.PI / 5);
|
| 397 |
+
points.push([cx + r * Math.cos(angle), cy - r * Math.sin(angle)]);
|
| 398 |
+
}
|
| 399 |
+
return 'M' + points.map(p => p.join(',')).join('L') + 'Z';
|
| 400 |
+
};
|
| 401 |
+
|
| 402 |
+
gLegend.selectAll('.legend-item')
|
| 403 |
+
.data(models, d => d.name)
|
| 404 |
+
.join('g')
|
| 405 |
+
.attr('class', d => `legend-item ${hiddenModels.has(d.name) ? 'dimmed' : ''}`)
|
| 406 |
+
.attr('transform', (d, i) => `translate(${legendX}, ${i * legendItemHeight})`)
|
| 407 |
+
.each(function(d) {
|
| 408 |
+
const g = d3.select(this);
|
| 409 |
+
g.selectAll('*').remove();
|
| 410 |
+
|
| 411 |
+
// Line segment (solid for all models)
|
| 412 |
+
g.append('line')
|
| 413 |
+
.attr('class', 'legend-line')
|
| 414 |
+
.attr('x1', 0)
|
| 415 |
+
.attr('x2', 20)
|
| 416 |
+
.attr('y1', 0)
|
| 417 |
+
.attr('y2', 0)
|
| 418 |
+
.attr('stroke', d.color)
|
| 419 |
+
.attr('stroke-width', 1.5);
|
| 420 |
+
|
| 421 |
+
// Marker - circle for closed, star for open
|
| 422 |
+
if (d.is_open) {
|
| 423 |
+
g.append('path')
|
| 424 |
+
.attr('class', 'legend-marker')
|
| 425 |
+
.attr('d', legendStarPath(10, 0, 6, 2.6))
|
| 426 |
+
.attr('fill', d.color);
|
| 427 |
+
} else {
|
| 428 |
+
g.append('circle')
|
| 429 |
+
.attr('class', 'legend-marker')
|
| 430 |
+
.attr('cx', 10)
|
| 431 |
+
.attr('cy', 0)
|
| 432 |
+
.attr('r', 3.5)
|
| 433 |
+
.attr('fill', d.color);
|
| 434 |
+
}
|
| 435 |
+
|
| 436 |
+
g.append('text')
|
| 437 |
+
.attr('class', 'legend-text')
|
| 438 |
+
.attr('x', 26)
|
| 439 |
+
.attr('y', 4)
|
| 440 |
+
.text(d.name);
|
| 441 |
+
|
| 442 |
+
g.style('cursor', 'pointer')
|
| 443 |
+
.on('click', () => toggleModel(d.name));
|
| 444 |
+
});
|
| 445 |
+
|
| 446 |
+
// Legend note
|
| 447 |
+
const noteY = models.length * legendItemHeight + 12;
|
| 448 |
+
gLegend.selectAll('.legend-note')
|
| 449 |
+
.data([0])
|
| 450 |
+
.join('text')
|
| 451 |
+
.attr('class', 'legend-note')
|
| 452 |
+
.attr('x', legendX)
|
| 453 |
+
.attr('y', noteY)
|
| 454 |
+
.attr('font-size', '10px')
|
| 455 |
+
.attr('fill', 'var(--muted-color)')
|
| 456 |
+
.text('● = Closed, ★ = Open');
|
| 457 |
+
}
|
| 458 |
+
|
| 459 |
+
// Initialize
|
| 460 |
+
fetch(DATA_URL, { cache: 'no-cache' })
|
| 461 |
+
.then(r => r.json())
|
| 462 |
+
.then(json => {
|
| 463 |
+
data = json;
|
| 464 |
+
render();
|
| 465 |
+
})
|
| 466 |
+
.catch(err => {
|
| 467 |
+
const pre = document.createElement('pre');
|
| 468 |
+
pre.style.color = 'red';
|
| 469 |
+
pre.style.padding = '16px';
|
| 470 |
+
pre.textContent = `Error loading data: ${err.message}`;
|
| 471 |
+
container.appendChild(pre);
|
| 472 |
+
});
|
| 473 |
+
|
| 474 |
+
// Resize handling
|
| 475 |
+
if (window.ResizeObserver) {
|
| 476 |
+
new ResizeObserver(() => render()).observe(container);
|
| 477 |
+
} else {
|
| 478 |
+
window.addEventListener('resize', render);
|
| 479 |
+
}
|
| 480 |
+
|
| 481 |
+
// Theme change handling
|
| 482 |
+
const observer = new MutationObserver(() => render());
|
| 483 |
+
observer.observe(document.documentElement, {
|
| 484 |
+
attributes: true,
|
| 485 |
+
attributeFilter: ['data-theme']
|
| 486 |
+
});
|
| 487 |
+
};
|
| 488 |
+
|
| 489 |
+
if (document.readyState === 'loading') {
|
| 490 |
+
document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
|
| 491 |
+
} else {
|
| 492 |
+
ensureD3(bootstrap);
|
| 493 |
+
}
|
| 494 |
+
})();
|
| 495 |
+
</script>
|
app/src/content/embeds/excess-caution.html
ADDED
|
@@ -0,0 +1,384 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div class="d3-excess-caution"></div>
|
| 2 |
+
<style>
|
| 3 |
+
.d3-excess-caution {
|
| 4 |
+
width: 100%;
|
| 5 |
+
margin: 10px 0;
|
| 6 |
+
position: relative;
|
| 7 |
+
font-family: system-ui, -apple-system, sans-serif;
|
| 8 |
+
}
|
| 9 |
+
|
| 10 |
+
.d3-excess-caution svg {
|
| 11 |
+
display: block;
|
| 12 |
+
width: 100%;
|
| 13 |
+
height: auto;
|
| 14 |
+
}
|
| 15 |
+
|
| 16 |
+
.d3-excess-caution .axes path,
|
| 17 |
+
.d3-excess-caution .axes line {
|
| 18 |
+
stroke: var(--axis-color, var(--text-color));
|
| 19 |
+
}
|
| 20 |
+
|
| 21 |
+
.d3-excess-caution .axes text {
|
| 22 |
+
fill: var(--tick-color, var(--muted-color));
|
| 23 |
+
font-size: 11px;
|
| 24 |
+
}
|
| 25 |
+
|
| 26 |
+
.d3-excess-caution .grid line {
|
| 27 |
+
stroke: var(--grid-color, rgba(0,0,0,.08));
|
| 28 |
+
}
|
| 29 |
+
|
| 30 |
+
.d3-excess-caution .axes text.axis-label {
|
| 31 |
+
font-size: 14px;
|
| 32 |
+
font-weight: 500;
|
| 33 |
+
fill: var(--text-color);
|
| 34 |
+
}
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
.d3-excess-caution .strip-point {
|
| 38 |
+
opacity: 0.5;
|
| 39 |
+
}
|
| 40 |
+
|
| 41 |
+
.d3-excess-caution .mean-line {
|
| 42 |
+
stroke-width: 4;
|
| 43 |
+
cursor: pointer;
|
| 44 |
+
}
|
| 45 |
+
|
| 46 |
+
.d3-excess-caution .mean-line:hover {
|
| 47 |
+
stroke-width: 5;
|
| 48 |
+
}
|
| 49 |
+
|
| 50 |
+
.d3-excess-caution .legend {
|
| 51 |
+
font-size: 11px;
|
| 52 |
+
}
|
| 53 |
+
|
| 54 |
+
.d3-excess-caution .legend-text {
|
| 55 |
+
fill: var(--text-color);
|
| 56 |
+
}
|
| 57 |
+
|
| 58 |
+
.d3-excess-caution .d3-tooltip {
|
| 59 |
+
position: absolute;
|
| 60 |
+
top: 0;
|
| 61 |
+
left: 0;
|
| 62 |
+
transform: translate(-9999px, -9999px);
|
| 63 |
+
pointer-events: none;
|
| 64 |
+
padding: 10px 12px;
|
| 65 |
+
border-radius: 8px;
|
| 66 |
+
font-size: 12px;
|
| 67 |
+
line-height: 1.4;
|
| 68 |
+
border: 1px solid var(--border-color);
|
| 69 |
+
background: var(--surface-bg);
|
| 70 |
+
color: var(--text-color);
|
| 71 |
+
box-shadow: 0 4px 24px rgba(0,0,0,.18);
|
| 72 |
+
opacity: 0;
|
| 73 |
+
transition: opacity 0.12s ease;
|
| 74 |
+
z-index: 10;
|
| 75 |
+
}
|
| 76 |
+
|
| 77 |
+
.d3-excess-caution .d3-tooltip .model-name {
|
| 78 |
+
font-weight: 600;
|
| 79 |
+
margin-bottom: 4px;
|
| 80 |
+
}
|
| 81 |
+
|
| 82 |
+
.d3-excess-caution .d3-tooltip .metric {
|
| 83 |
+
display: flex;
|
| 84 |
+
justify-content: space-between;
|
| 85 |
+
gap: 16px;
|
| 86 |
+
}
|
| 87 |
+
|
| 88 |
+
.d3-excess-caution .d3-tooltip .metric-label {
|
| 89 |
+
color: var(--muted-color);
|
| 90 |
+
}
|
| 91 |
+
|
| 92 |
+
.d3-excess-caution .d3-tooltip .metric-value {
|
| 93 |
+
font-weight: 500;
|
| 94 |
+
}
|
| 95 |
+
</style>
|
| 96 |
+
<script>
|
| 97 |
+
(() => {
|
| 98 |
+
const ensureD3 = (cb) => {
|
| 99 |
+
if (window.d3 && typeof window.d3.select === 'function') return cb();
|
| 100 |
+
let s = document.getElementById('d3-cdn-script');
|
| 101 |
+
if (!s) {
|
| 102 |
+
s = document.createElement('script');
|
| 103 |
+
s.id = 'd3-cdn-script';
|
| 104 |
+
s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
|
| 105 |
+
document.head.appendChild(s);
|
| 106 |
+
}
|
| 107 |
+
const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
|
| 108 |
+
s.addEventListener('load', onReady, { once: true });
|
| 109 |
+
if (window.d3) onReady();
|
| 110 |
+
};
|
| 111 |
+
|
| 112 |
+
const bootstrap = () => {
|
| 113 |
+
const scriptEl = document.currentScript;
|
| 114 |
+
let container = scriptEl ? scriptEl.previousElementSibling : null;
|
| 115 |
+
if (!(container && container.classList && container.classList.contains('d3-excess-caution'))) {
|
| 116 |
+
const candidates = Array.from(document.querySelectorAll('.d3-excess-caution'))
|
| 117 |
+
.filter((el) => !(el.dataset && el.dataset.mounted === 'true'));
|
| 118 |
+
container = candidates[candidates.length - 1] || null;
|
| 119 |
+
}
|
| 120 |
+
if (!container) return;
|
| 121 |
+
if (container.dataset) {
|
| 122 |
+
if (container.dataset.mounted === 'true') return;
|
| 123 |
+
container.dataset.mounted = 'true';
|
| 124 |
+
}
|
| 125 |
+
|
| 126 |
+
// Tooltip setup
|
| 127 |
+
container.style.position = container.style.position || 'relative';
|
| 128 |
+
const tip = document.createElement('div');
|
| 129 |
+
tip.className = 'd3-tooltip';
|
| 130 |
+
container.appendChild(tip);
|
| 131 |
+
|
| 132 |
+
// SVG setup
|
| 133 |
+
const svg = d3.select(container).append('svg');
|
| 134 |
+
const gRoot = svg.append('g');
|
| 135 |
+
|
| 136 |
+
// Chart groups
|
| 137 |
+
const gGrid = gRoot.append('g').attr('class', 'grid');
|
| 138 |
+
const gAxes = gRoot.append('g').attr('class', 'axes');
|
| 139 |
+
const gPoints = gRoot.append('g').attr('class', 'points');
|
| 140 |
+
const gMeans = gRoot.append('g').attr('class', 'means');
|
| 141 |
+
const gLegend = gRoot.append('g').attr('class', 'legend');
|
| 142 |
+
|
| 143 |
+
// State
|
| 144 |
+
let data = null;
|
| 145 |
+
let width = 800;
|
| 146 |
+
let height = 450;
|
| 147 |
+
const margin = { top: 20, right: 30, bottom: 50, left: 160 };
|
| 148 |
+
|
| 149 |
+
// Scales (swapped: X is now linear, Y is categorical)
|
| 150 |
+
const xScale = d3.scaleLinear();
|
| 151 |
+
const yScale = d3.scaleBand();
|
| 152 |
+
|
| 153 |
+
// Data loading
|
| 154 |
+
const DATA_URL = '/data/excess_caution.json';
|
| 155 |
+
|
| 156 |
+
// Seeded random for consistent jitter
|
| 157 |
+
function seededRandom(seed) {
|
| 158 |
+
const x = Math.sin(seed) * 10000;
|
| 159 |
+
return x - Math.floor(x);
|
| 160 |
+
}
|
| 161 |
+
|
| 162 |
+
// Compute quartiles from array
|
| 163 |
+
function computeQuartiles(values) {
|
| 164 |
+
const sorted = [...values].sort((a, b) => a - b);
|
| 165 |
+
const n = sorted.length;
|
| 166 |
+
const q1 = sorted[Math.floor(n * 0.25)];
|
| 167 |
+
const median = sorted[Math.floor(n * 0.5)];
|
| 168 |
+
const q3 = sorted[Math.floor(n * 0.75)];
|
| 169 |
+
return { q1, median, q3 };
|
| 170 |
+
}
|
| 171 |
+
|
| 172 |
+
function showTooltip(event, model) {
|
| 173 |
+
const rect = container.getBoundingClientRect();
|
| 174 |
+
const x = event.clientX - rect.left;
|
| 175 |
+
const y = event.clientY - rect.top;
|
| 176 |
+
const quartiles = computeQuartiles(model.values);
|
| 177 |
+
|
| 178 |
+
tip.innerHTML = `
|
| 179 |
+
<div class="model-name" style="color: ${model.color}">${model.name}</div>
|
| 180 |
+
<div class="metric">
|
| 181 |
+
<span class="metric-label">Mean:</span>
|
| 182 |
+
<span class="metric-value">${model.mean.toFixed(2)}</span>
|
| 183 |
+
</div>
|
| 184 |
+
<div class="metric">
|
| 185 |
+
<span class="metric-label">Median:</span>
|
| 186 |
+
<span class="metric-value">${quartiles.median}</span>
|
| 187 |
+
</div>
|
| 188 |
+
<div class="metric">
|
| 189 |
+
<span class="metric-label">Q1 / Q3:</span>
|
| 190 |
+
<span class="metric-value">${quartiles.q1} / ${quartiles.q3}</span>
|
| 191 |
+
</div>
|
| 192 |
+
<div class="metric">
|
| 193 |
+
<span class="metric-label">Samples:</span>
|
| 194 |
+
<span class="metric-value">${model.count}</span>
|
| 195 |
+
</div>
|
| 196 |
+
`;
|
| 197 |
+
|
| 198 |
+
const tipWidth = tip.offsetWidth || 150;
|
| 199 |
+
const tipHeight = tip.offsetHeight || 100;
|
| 200 |
+
let tipX = x + 12;
|
| 201 |
+
let tipY = y - tipHeight / 2;
|
| 202 |
+
|
| 203 |
+
if (tipX + tipWidth > width) tipX = x - tipWidth - 12;
|
| 204 |
+
if (tipY < 0) tipY = 8;
|
| 205 |
+
if (tipY + tipHeight > height) tipY = height - tipHeight - 8;
|
| 206 |
+
|
| 207 |
+
tip.style.transform = `translate(${tipX}px, ${tipY}px)`;
|
| 208 |
+
tip.style.opacity = '1';
|
| 209 |
+
}
|
| 210 |
+
|
| 211 |
+
function hideTooltip() {
|
| 212 |
+
tip.style.opacity = '0';
|
| 213 |
+
tip.style.transform = 'translate(-9999px, -9999px)';
|
| 214 |
+
}
|
| 215 |
+
|
| 216 |
+
function updateSize() {
|
| 217 |
+
width = container.clientWidth || 800;
|
| 218 |
+
// Taller chart for horizontal layout with 10 models
|
| 219 |
+
height = Math.max(400, Math.round(width * 0.6));
|
| 220 |
+
svg.attr('width', width).attr('height', height).attr('viewBox', `0 0 ${width} ${height}`);
|
| 221 |
+
gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
|
| 222 |
+
return {
|
| 223 |
+
innerWidth: width - margin.left - margin.right,
|
| 224 |
+
innerHeight: height - margin.top - margin.bottom
|
| 225 |
+
};
|
| 226 |
+
}
|
| 227 |
+
|
| 228 |
+
function render() {
|
| 229 |
+
if (!data) return;
|
| 230 |
+
|
| 231 |
+
const { innerWidth, innerHeight } = updateSize();
|
| 232 |
+
|
| 233 |
+
// Sort models by mean (descending - most cautious at top)
|
| 234 |
+
const models = [...data.models].sort((a, b) => b.mean - a.mean);
|
| 235 |
+
|
| 236 |
+
// X scale: linear (early correct turns)
|
| 237 |
+
const maxValue = d3.max(models, m => d3.max(m.values)) || 10;
|
| 238 |
+
xScale
|
| 239 |
+
.domain([0, maxValue + 0.5])
|
| 240 |
+
.range([0, innerWidth]);
|
| 241 |
+
|
| 242 |
+
// Y scale: categorical (model names)
|
| 243 |
+
yScale
|
| 244 |
+
.domain(models.map(m => m.name))
|
| 245 |
+
.range([0, innerHeight])
|
| 246 |
+
.padding(0.3);
|
| 247 |
+
|
| 248 |
+
// Grid lines (vertical)
|
| 249 |
+
const xTicks = xScale.ticks(6);
|
| 250 |
+
gGrid.selectAll('.grid-x')
|
| 251 |
+
.data(xTicks)
|
| 252 |
+
.join('line')
|
| 253 |
+
.attr('class', 'grid-x')
|
| 254 |
+
.attr('x1', d => xScale(d))
|
| 255 |
+
.attr('x2', d => xScale(d))
|
| 256 |
+
.attr('y1', 0)
|
| 257 |
+
.attr('y2', innerHeight);
|
| 258 |
+
|
| 259 |
+
// Remove old horizontal grid lines
|
| 260 |
+
gGrid.selectAll('.grid-y').remove();
|
| 261 |
+
|
| 262 |
+
// Axes
|
| 263 |
+
const tickSize = 6;
|
| 264 |
+
|
| 265 |
+
gAxes.selectAll('.x-axis')
|
| 266 |
+
.data([0])
|
| 267 |
+
.join('g')
|
| 268 |
+
.attr('class', 'x-axis')
|
| 269 |
+
.attr('transform', `translate(0,${innerHeight})`)
|
| 270 |
+
.call(d3.axisBottom(xScale)
|
| 271 |
+
.ticks(6)
|
| 272 |
+
.tickFormat(d3.format('d'))
|
| 273 |
+
.tickSizeInner(-tickSize)
|
| 274 |
+
.tickSizeOuter(0));
|
| 275 |
+
|
| 276 |
+
gAxes.selectAll('.y-axis')
|
| 277 |
+
.data([0])
|
| 278 |
+
.join('g')
|
| 279 |
+
.attr('class', 'y-axis')
|
| 280 |
+
.call(d3.axisLeft(yScale)
|
| 281 |
+
.tickSizeInner(-tickSize)
|
| 282 |
+
.tickSizeOuter(0));
|
| 283 |
+
|
| 284 |
+
// X-axis label
|
| 285 |
+
gAxes.selectAll('.x-label')
|
| 286 |
+
.data([0])
|
| 287 |
+
.join('text')
|
| 288 |
+
.attr('class', 'x-label axis-label')
|
| 289 |
+
.attr('x', innerWidth / 2)
|
| 290 |
+
.attr('y', innerHeight + 40)
|
| 291 |
+
.attr('text-anchor', 'middle')
|
| 292 |
+
.text('Early Correct Turns');
|
| 293 |
+
|
| 294 |
+
// Remove old Y-axis label
|
| 295 |
+
gAxes.selectAll('.y-label').remove();
|
| 296 |
+
|
| 297 |
+
// Create flat array of all points with horizontal jitter
|
| 298 |
+
const bandHeight = yScale.bandwidth();
|
| 299 |
+
const jitterWidth = 8; // Fixed horizontal jitter in pixels
|
| 300 |
+
const pointRadius = Math.min(2.5, bandHeight / 20);
|
| 301 |
+
|
| 302 |
+
const allPoints = models.flatMap((model, modelIdx) =>
|
| 303 |
+
model.values.map((value, i) => ({
|
| 304 |
+
model,
|
| 305 |
+
value,
|
| 306 |
+
// Seeded random jitter for consistency (horizontal)
|
| 307 |
+
jitter: (seededRandom(modelIdx * 1000 + i) - 0.5) * jitterWidth
|
| 308 |
+
}))
|
| 309 |
+
);
|
| 310 |
+
|
| 311 |
+
// Draw all points as small circles
|
| 312 |
+
gPoints.selectAll('.strip-point')
|
| 313 |
+
.data(allPoints, (d, i) => `${d.model.name}-${i}`)
|
| 314 |
+
.join('circle')
|
| 315 |
+
.attr('class', 'strip-point')
|
| 316 |
+
.attr('cx', d => xScale(d.value) + d.jitter)
|
| 317 |
+
.attr('cy', d => yScale(d.model.name) + bandHeight / 2)
|
| 318 |
+
.attr('r', pointRadius)
|
| 319 |
+
.attr('fill', d => d.model.color);
|
| 320 |
+
|
| 321 |
+
// Mean lines with hover (now vertical)
|
| 322 |
+
const meanLineHeight = bandHeight * 0.78;
|
| 323 |
+
gMeans.selectAll('.mean-line')
|
| 324 |
+
.data(models, d => d.name)
|
| 325 |
+
.join('line')
|
| 326 |
+
.attr('class', 'mean-line')
|
| 327 |
+
.attr('x1', d => xScale(d.mean))
|
| 328 |
+
.attr('x2', d => xScale(d.mean))
|
| 329 |
+
.attr('y1', d => yScale(d.name) + bandHeight / 2 - meanLineHeight / 2)
|
| 330 |
+
.attr('y2', d => yScale(d.name) + bandHeight / 2 + meanLineHeight / 2)
|
| 331 |
+
.attr('stroke', d => d.color)
|
| 332 |
+
.on('mouseenter', (event, d) => showTooltip(event, d))
|
| 333 |
+
.on('mousemove', (event, d) => showTooltip(event, d))
|
| 334 |
+
.on('mouseleave', hideTooltip);
|
| 335 |
+
|
| 336 |
+
// Legend
|
| 337 |
+
gLegend.selectAll('.legend-note')
|
| 338 |
+
.data([0])
|
| 339 |
+
.join('text')
|
| 340 |
+
.attr('class', 'legend-note legend-text')
|
| 341 |
+
.attr('x', innerWidth / 2)
|
| 342 |
+
.attr('y', innerHeight + 40)
|
| 343 |
+
.attr('text-anchor', 'middle')
|
| 344 |
+
.attr('font-size', '11px')
|
| 345 |
+
.text('');
|
| 346 |
+
}
|
| 347 |
+
|
| 348 |
+
// Initialize
|
| 349 |
+
fetch(DATA_URL, { cache: 'no-cache' })
|
| 350 |
+
.then(r => r.json())
|
| 351 |
+
.then(json => {
|
| 352 |
+
data = json;
|
| 353 |
+
render();
|
| 354 |
+
})
|
| 355 |
+
.catch(err => {
|
| 356 |
+
const pre = document.createElement('pre');
|
| 357 |
+
pre.style.color = 'red';
|
| 358 |
+
pre.style.padding = '16px';
|
| 359 |
+
pre.textContent = `Error loading data: ${err.message}`;
|
| 360 |
+
container.appendChild(pre);
|
| 361 |
+
});
|
| 362 |
+
|
| 363 |
+
// Resize handling
|
| 364 |
+
if (window.ResizeObserver) {
|
| 365 |
+
new ResizeObserver(() => render()).observe(container);
|
| 366 |
+
} else {
|
| 367 |
+
window.addEventListener('resize', render);
|
| 368 |
+
}
|
| 369 |
+
|
| 370 |
+
// Theme change handling
|
| 371 |
+
const observer = new MutationObserver(() => render());
|
| 372 |
+
observer.observe(document.documentElement, {
|
| 373 |
+
attributes: true,
|
| 374 |
+
attributeFilter: ['data-theme']
|
| 375 |
+
});
|
| 376 |
+
};
|
| 377 |
+
|
| 378 |
+
if (document.readyState === 'loading') {
|
| 379 |
+
document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
|
| 380 |
+
} else {
|
| 381 |
+
ensureD3(bootstrap);
|
| 382 |
+
}
|
| 383 |
+
})();
|
| 384 |
+
</script>
|
app/src/content/embeds/reckless-guessing.html
ADDED
|
@@ -0,0 +1,400 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div class="d3-reckless-guessing"></div>
|
| 2 |
+
<style>
|
| 3 |
+
.d3-reckless-guessing {
|
| 4 |
+
width: 100%;
|
| 5 |
+
margin: 10px 0;
|
| 6 |
+
position: relative;
|
| 7 |
+
font-family: system-ui, -apple-system, sans-serif;
|
| 8 |
+
}
|
| 9 |
+
|
| 10 |
+
.d3-reckless-guessing svg {
|
| 11 |
+
display: block;
|
| 12 |
+
width: 100%;
|
| 13 |
+
height: auto;
|
| 14 |
+
}
|
| 15 |
+
|
| 16 |
+
.d3-reckless-guessing .axes path,
|
| 17 |
+
.d3-reckless-guessing .axes line {
|
| 18 |
+
stroke: var(--axis-color, var(--text-color));
|
| 19 |
+
}
|
| 20 |
+
|
| 21 |
+
.d3-reckless-guessing .axes text {
|
| 22 |
+
fill: var(--tick-color, var(--muted-color));
|
| 23 |
+
font-size: 12px;
|
| 24 |
+
}
|
| 25 |
+
|
| 26 |
+
.d3-reckless-guessing .grid line {
|
| 27 |
+
stroke: var(--grid-color, rgba(0,0,0,.08));
|
| 28 |
+
}
|
| 29 |
+
|
| 30 |
+
.d3-reckless-guessing .axes text.axis-label {
|
| 31 |
+
font-size: 14px;
|
| 32 |
+
font-weight: 500;
|
| 33 |
+
fill: var(--text-color);
|
| 34 |
+
}
|
| 35 |
+
|
| 36 |
+
.d3-reckless-guessing .axes text.chart-title {
|
| 37 |
+
font-size: 16px;
|
| 38 |
+
font-weight: 600;
|
| 39 |
+
fill: var(--text-color);
|
| 40 |
+
}
|
| 41 |
+
|
| 42 |
+
.d3-reckless-guessing .axes text.subtitle {
|
| 43 |
+
font-size: 11px;
|
| 44 |
+
font-style: italic;
|
| 45 |
+
fill: var(--muted-color);
|
| 46 |
+
}
|
| 47 |
+
|
| 48 |
+
.d3-reckless-guessing .model-label {
|
| 49 |
+
font-size: 13px;
|
| 50 |
+
font-weight: 500;
|
| 51 |
+
}
|
| 52 |
+
|
| 53 |
+
.d3-reckless-guessing .bar {
|
| 54 |
+
cursor: pointer;
|
| 55 |
+
transition: opacity 0.15s ease;
|
| 56 |
+
}
|
| 57 |
+
|
| 58 |
+
.d3-reckless-guessing .bar:hover {
|
| 59 |
+
opacity: 0.8;
|
| 60 |
+
}
|
| 61 |
+
|
| 62 |
+
|
| 63 |
+
.d3-reckless-guessing .percent-label {
|
| 64 |
+
font-size: 12px;
|
| 65 |
+
font-weight: 500;
|
| 66 |
+
fill: var(--text-color);
|
| 67 |
+
}
|
| 68 |
+
|
| 69 |
+
.d3-reckless-guessing .d3-tooltip {
|
| 70 |
+
position: absolute;
|
| 71 |
+
top: 0;
|
| 72 |
+
left: 0;
|
| 73 |
+
transform: translate(-9999px, -9999px);
|
| 74 |
+
pointer-events: none;
|
| 75 |
+
padding: 10px 12px;
|
| 76 |
+
border-radius: 8px;
|
| 77 |
+
font-size: 12px;
|
| 78 |
+
line-height: 1.4;
|
| 79 |
+
border: 1px solid var(--border-color);
|
| 80 |
+
background: var(--surface-bg);
|
| 81 |
+
color: var(--text-color);
|
| 82 |
+
box-shadow: 0 4px 24px rgba(0,0,0,.18);
|
| 83 |
+
opacity: 0;
|
| 84 |
+
transition: opacity 0.12s ease;
|
| 85 |
+
z-index: 10;
|
| 86 |
+
}
|
| 87 |
+
|
| 88 |
+
.d3-reckless-guessing .d3-tooltip .model-name {
|
| 89 |
+
font-weight: 600;
|
| 90 |
+
margin-bottom: 4px;
|
| 91 |
+
}
|
| 92 |
+
|
| 93 |
+
.d3-reckless-guessing .d3-tooltip .metric {
|
| 94 |
+
display: flex;
|
| 95 |
+
justify-content: space-between;
|
| 96 |
+
gap: 16px;
|
| 97 |
+
}
|
| 98 |
+
|
| 99 |
+
.d3-reckless-guessing .d3-tooltip .metric-label {
|
| 100 |
+
color: var(--muted-color);
|
| 101 |
+
}
|
| 102 |
+
|
| 103 |
+
.d3-reckless-guessing .d3-tooltip .metric-value {
|
| 104 |
+
font-weight: 500;
|
| 105 |
+
}
|
| 106 |
+
</style>
|
| 107 |
+
<script>
|
| 108 |
+
(() => {
|
| 109 |
+
const ensureD3 = (cb) => {
|
| 110 |
+
if (window.d3 && typeof window.d3.select === 'function') return cb();
|
| 111 |
+
let s = document.getElementById('d3-cdn-script');
|
| 112 |
+
if (!s) {
|
| 113 |
+
s = document.createElement('script');
|
| 114 |
+
s.id = 'd3-cdn-script';
|
| 115 |
+
s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
|
| 116 |
+
document.head.appendChild(s);
|
| 117 |
+
}
|
| 118 |
+
const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
|
| 119 |
+
s.addEventListener('load', onReady, { once: true });
|
| 120 |
+
if (window.d3) onReady();
|
| 121 |
+
};
|
| 122 |
+
|
| 123 |
+
const bootstrap = () => {
|
| 124 |
+
const scriptEl = document.currentScript;
|
| 125 |
+
let container = scriptEl ? scriptEl.previousElementSibling : null;
|
| 126 |
+
if (!(container && container.classList && container.classList.contains('d3-reckless-guessing'))) {
|
| 127 |
+
const candidates = Array.from(document.querySelectorAll('.d3-reckless-guessing'))
|
| 128 |
+
.filter((el) => !(el.dataset && el.dataset.mounted === 'true'));
|
| 129 |
+
container = candidates[candidates.length - 1] || null;
|
| 130 |
+
}
|
| 131 |
+
if (!container) return;
|
| 132 |
+
if (container.dataset) {
|
| 133 |
+
if (container.dataset.mounted === 'true') return;
|
| 134 |
+
container.dataset.mounted = 'true';
|
| 135 |
+
}
|
| 136 |
+
|
| 137 |
+
// Tooltip setup
|
| 138 |
+
container.style.position = container.style.position || 'relative';
|
| 139 |
+
const tip = document.createElement('div');
|
| 140 |
+
tip.className = 'd3-tooltip';
|
| 141 |
+
container.appendChild(tip);
|
| 142 |
+
|
| 143 |
+
// SVG setup
|
| 144 |
+
const svg = d3.select(container).append('svg');
|
| 145 |
+
const gRoot = svg.append('g');
|
| 146 |
+
|
| 147 |
+
// Chart groups
|
| 148 |
+
const gGrid = gRoot.append('g').attr('class', 'grid');
|
| 149 |
+
const gAxes = gRoot.append('g').attr('class', 'axes');
|
| 150 |
+
const gBars = gRoot.append('g').attr('class', 'bars');
|
| 151 |
+
const gLabels = gRoot.append('g').attr('class', 'labels');
|
| 152 |
+
|
| 153 |
+
// State
|
| 154 |
+
let data = null;
|
| 155 |
+
let width = 800;
|
| 156 |
+
let height = 450;
|
| 157 |
+
const margin = { top: 40, right: 50, bottom: 56, left: 20 };
|
| 158 |
+
|
| 159 |
+
// Scales
|
| 160 |
+
const xScale = d3.scaleLinear();
|
| 161 |
+
const yScale = d3.scaleBand();
|
| 162 |
+
|
| 163 |
+
// Data loading
|
| 164 |
+
const JSON_PATHS = [
|
| 165 |
+
'/data/reckless_guessing.json',
|
| 166 |
+
'./assets/data/reckless_guessing.json',
|
| 167 |
+
'../assets/data/reckless_guessing.json',
|
| 168 |
+
'../../assets/data/reckless_guessing.json'
|
| 169 |
+
];
|
| 170 |
+
|
| 171 |
+
const fetchFirstAvailable = async (paths) => {
|
| 172 |
+
for (const p of paths) {
|
| 173 |
+
try {
|
| 174 |
+
const r = await fetch(p, { cache: 'no-cache' });
|
| 175 |
+
if (r.ok) return await r.json();
|
| 176 |
+
} catch (_) {}
|
| 177 |
+
}
|
| 178 |
+
throw new Error('Data not found');
|
| 179 |
+
};
|
| 180 |
+
|
| 181 |
+
function updateSize() {
|
| 182 |
+
width = container.clientWidth || 800;
|
| 183 |
+
const numModels = data ? data.models.length : 10;
|
| 184 |
+
const barHeight = 36;
|
| 185 |
+
height = margin.top + margin.bottom + numModels * barHeight;
|
| 186 |
+
svg.attr('width', width).attr('height', height).attr('viewBox', `0 0 ${width} ${height}`);
|
| 187 |
+
gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
|
| 188 |
+
return {
|
| 189 |
+
innerWidth: width - margin.left - margin.right,
|
| 190 |
+
innerHeight: height - margin.top - margin.bottom
|
| 191 |
+
};
|
| 192 |
+
}
|
| 193 |
+
|
| 194 |
+
function showTooltip(event, d) {
|
| 195 |
+
const rect = container.getBoundingClientRect();
|
| 196 |
+
const x = event.clientX - rect.left;
|
| 197 |
+
const y = event.clientY - rect.top;
|
| 198 |
+
|
| 199 |
+
tip.innerHTML = `
|
| 200 |
+
<div class="model-name" style="color: ${d.color}">${d.name}</div>
|
| 201 |
+
<div class="metric">
|
| 202 |
+
<span class="metric-label">Double-Down Rate:</span>
|
| 203 |
+
<span class="metric-value">${(d.double_down_rate * 100).toFixed(0)}%</span>
|
| 204 |
+
</div>
|
| 205 |
+
<div class="metric">
|
| 206 |
+
<span class="metric-label">Wrong Guesses:</span>
|
| 207 |
+
<span class="metric-value">${d.wrong_guesses}</span>
|
| 208 |
+
</div>
|
| 209 |
+
<div class="metric">
|
| 210 |
+
<span class="metric-label">Next Turn Guesses:</span>
|
| 211 |
+
<span class="metric-value">${d.next_turn_guesses}</span>
|
| 212 |
+
</div>
|
| 213 |
+
<div class="metric">
|
| 214 |
+
<span class="metric-label">Max Streak:</span>
|
| 215 |
+
<span class="metric-value">${d.max_streak}</span>
|
| 216 |
+
</div>
|
| 217 |
+
<div class="metric">
|
| 218 |
+
<span class="metric-label">Type:</span>
|
| 219 |
+
<span class="metric-value">${d.is_open ? 'Open' : 'Closed'}</span>
|
| 220 |
+
</div>
|
| 221 |
+
`;
|
| 222 |
+
|
| 223 |
+
const tipWidth = tip.offsetWidth || 180;
|
| 224 |
+
const tipHeight = tip.offsetHeight || 120;
|
| 225 |
+
let tipX = x + 12;
|
| 226 |
+
let tipY = y - tipHeight / 2;
|
| 227 |
+
|
| 228 |
+
if (tipX + tipWidth > width) tipX = x - tipWidth - 12;
|
| 229 |
+
if (tipY < 0) tipY = 8;
|
| 230 |
+
if (tipY + tipHeight > height) tipY = height - tipHeight - 8;
|
| 231 |
+
|
| 232 |
+
tip.style.transform = `translate(${tipX}px, ${tipY}px)`;
|
| 233 |
+
tip.style.opacity = '1';
|
| 234 |
+
}
|
| 235 |
+
|
| 236 |
+
function hideTooltip() {
|
| 237 |
+
tip.style.opacity = '0';
|
| 238 |
+
tip.style.transform = 'translate(-9999px, -9999px)';
|
| 239 |
+
}
|
| 240 |
+
|
| 241 |
+
// Calculate relative luminance and return black or white for best contrast
|
| 242 |
+
function getContrastColor(hexColor) {
|
| 243 |
+
const hex = hexColor.replace('#', '');
|
| 244 |
+
const r = parseInt(hex.substr(0, 2), 16) / 255;
|
| 245 |
+
const g = parseInt(hex.substr(2, 2), 16) / 255;
|
| 246 |
+
const b = parseInt(hex.substr(4, 2), 16) / 255;
|
| 247 |
+
const luminance = 0.299 * r + 0.587 * g + 0.114 * b;
|
| 248 |
+
return luminance > 0.5 ? '#000000' : '#ffffff';
|
| 249 |
+
}
|
| 250 |
+
|
| 251 |
+
function render() {
|
| 252 |
+
if (!data) return;
|
| 253 |
+
|
| 254 |
+
const { innerWidth, innerHeight } = updateSize();
|
| 255 |
+
|
| 256 |
+
// Sort models by double_down_rate descending
|
| 257 |
+
const models = [...data.models].sort((a, b) => b.double_down_rate - a.double_down_rate);
|
| 258 |
+
|
| 259 |
+
// Update scales
|
| 260 |
+
xScale
|
| 261 |
+
.domain([0, 0.8])
|
| 262 |
+
.range([0, innerWidth]);
|
| 263 |
+
|
| 264 |
+
yScale
|
| 265 |
+
.domain(models.map(d => d.name))
|
| 266 |
+
.range([0, innerHeight])
|
| 267 |
+
.padding(0.25);
|
| 268 |
+
|
| 269 |
+
// Grid lines (vertical)
|
| 270 |
+
const xTicks = [0, 0.2, 0.4, 0.6, 0.8];
|
| 271 |
+
gGrid.selectAll('.grid-x')
|
| 272 |
+
.data(xTicks)
|
| 273 |
+
.join('line')
|
| 274 |
+
.attr('class', 'grid-x')
|
| 275 |
+
.attr('x1', d => xScale(d))
|
| 276 |
+
.attr('x2', d => xScale(d))
|
| 277 |
+
.attr('y1', 0)
|
| 278 |
+
.attr('y2', innerHeight);
|
| 279 |
+
|
| 280 |
+
// Title
|
| 281 |
+
gAxes.selectAll('.chart-title')
|
| 282 |
+
.data([0])
|
| 283 |
+
.join('text')
|
| 284 |
+
.attr('class', 'chart-title')
|
| 285 |
+
.attr('x', innerWidth / 2)
|
| 286 |
+
.attr('y', -20)
|
| 287 |
+
.attr('text-anchor', 'middle')
|
| 288 |
+
.text('After Wrong Guess: % Guessing Again Next Turn');
|
| 289 |
+
|
| 290 |
+
// X-axis (bottom)
|
| 291 |
+
gAxes.selectAll('.x-axis')
|
| 292 |
+
.data([0])
|
| 293 |
+
.join('g')
|
| 294 |
+
.attr('class', 'x-axis')
|
| 295 |
+
.attr('transform', `translate(0,${innerHeight})`)
|
| 296 |
+
.call(d3.axisBottom(xScale)
|
| 297 |
+
.tickValues(xTicks)
|
| 298 |
+
.tickFormat(d => `${Math.round(d * 100)}%`)
|
| 299 |
+
.tickSizeOuter(0));
|
| 300 |
+
|
| 301 |
+
// X-axis label
|
| 302 |
+
gAxes.selectAll('.x-label')
|
| 303 |
+
.data([0])
|
| 304 |
+
.join('text')
|
| 305 |
+
.attr('class', 'x-label axis-label')
|
| 306 |
+
.attr('x', innerWidth / 2)
|
| 307 |
+
.attr('y', innerHeight + 34)
|
| 308 |
+
.attr('text-anchor', 'middle')
|
| 309 |
+
.text('Double-Down Rate');
|
| 310 |
+
|
| 311 |
+
// Subtitle
|
| 312 |
+
gAxes.selectAll('.subtitle')
|
| 313 |
+
.data([0])
|
| 314 |
+
.join('text')
|
| 315 |
+
.attr('class', 'subtitle')
|
| 316 |
+
.attr('x', innerWidth / 2)
|
| 317 |
+
.attr('y', innerHeight + 48)
|
| 318 |
+
.attr('text-anchor', 'middle')
|
| 319 |
+
.text('Higher = more reckless (keeps guessing after failures)');
|
| 320 |
+
|
| 321 |
+
// Bars
|
| 322 |
+
const barHeight = yScale.bandwidth();
|
| 323 |
+
|
| 324 |
+
// All models with filled bars
|
| 325 |
+
gBars.selectAll('.bar')
|
| 326 |
+
.data(models, d => d.name)
|
| 327 |
+
.join('rect')
|
| 328 |
+
.attr('class', 'bar')
|
| 329 |
+
.attr('x', 0)
|
| 330 |
+
.attr('y', d => yScale(d.name))
|
| 331 |
+
.attr('width', d => xScale(d.double_down_rate))
|
| 332 |
+
.attr('height', barHeight)
|
| 333 |
+
.attr('fill', d => d.color)
|
| 334 |
+
.attr('rx', 3)
|
| 335 |
+
.attr('ry', 3)
|
| 336 |
+
.on('mouseenter', showTooltip)
|
| 337 |
+
.on('mousemove', showTooltip)
|
| 338 |
+
.on('mouseleave', hideTooltip);
|
| 339 |
+
|
| 340 |
+
// Model labels (inside bars)
|
| 341 |
+
gLabels.selectAll('.model-label')
|
| 342 |
+
.data(models, d => d.name)
|
| 343 |
+
.join('text')
|
| 344 |
+
.attr('class', 'model-label')
|
| 345 |
+
.attr('x', 8)
|
| 346 |
+
.attr('y', d => yScale(d.name) + barHeight / 2)
|
| 347 |
+
.attr('dy', '0.35em')
|
| 348 |
+
.attr('text-anchor', 'start')
|
| 349 |
+
.style('fill', d => getContrastColor(d.color))
|
| 350 |
+
.text(d => d.name);
|
| 351 |
+
|
| 352 |
+
// Percentage labels (end of bars)
|
| 353 |
+
gLabels.selectAll('.percent-label')
|
| 354 |
+
.data(models, d => d.name)
|
| 355 |
+
.join('text')
|
| 356 |
+
.attr('class', 'percent-label')
|
| 357 |
+
.attr('x', d => xScale(d.double_down_rate) + 6)
|
| 358 |
+
.attr('y', d => yScale(d.name) + barHeight / 2)
|
| 359 |
+
.attr('dy', '0.35em')
|
| 360 |
+
.attr('text-anchor', 'start')
|
| 361 |
+
.text(d => `${Math.round(d.double_down_rate * 100)}%`);
|
| 362 |
+
|
| 363 |
+
}
|
| 364 |
+
|
| 365 |
+
// Initialize
|
| 366 |
+
fetchFirstAvailable(JSON_PATHS)
|
| 367 |
+
.then(json => {
|
| 368 |
+
data = json;
|
| 369 |
+
render();
|
| 370 |
+
})
|
| 371 |
+
.catch(err => {
|
| 372 |
+
const pre = document.createElement('pre');
|
| 373 |
+
pre.style.color = 'red';
|
| 374 |
+
pre.style.padding = '16px';
|
| 375 |
+
pre.textContent = `Error loading data: ${err.message}`;
|
| 376 |
+
container.appendChild(pre);
|
| 377 |
+
});
|
| 378 |
+
|
| 379 |
+
// Resize handling
|
| 380 |
+
if (window.ResizeObserver) {
|
| 381 |
+
new ResizeObserver(() => render()).observe(container);
|
| 382 |
+
} else {
|
| 383 |
+
window.addEventListener('resize', render);
|
| 384 |
+
}
|
| 385 |
+
|
| 386 |
+
// Theme change handling
|
| 387 |
+
const observer = new MutationObserver(() => render());
|
| 388 |
+
observer.observe(document.documentElement, {
|
| 389 |
+
attributes: true,
|
| 390 |
+
attributeFilter: ['data-theme']
|
| 391 |
+
});
|
| 392 |
+
};
|
| 393 |
+
|
| 394 |
+
if (document.readyState === 'loading') {
|
| 395 |
+
document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
|
| 396 |
+
} else {
|
| 397 |
+
ensureD3(bootstrap);
|
| 398 |
+
}
|
| 399 |
+
})();
|
| 400 |
+
</script>
|
app/src/content/embeds/score-stack.html
ADDED
|
@@ -0,0 +1,440 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div class="d3-score-stack"></div>
|
| 2 |
+
<style>
|
| 3 |
+
.d3-score-stack {
|
| 4 |
+
width: 100%;
|
| 5 |
+
margin: 10px 0;
|
| 6 |
+
position: relative;
|
| 7 |
+
font-family: system-ui, -apple-system, sans-serif;
|
| 8 |
+
}
|
| 9 |
+
|
| 10 |
+
.d3-score-stack svg {
|
| 11 |
+
display: block;
|
| 12 |
+
width: 100%;
|
| 13 |
+
height: auto;
|
| 14 |
+
}
|
| 15 |
+
|
| 16 |
+
.d3-score-stack .axes path,
|
| 17 |
+
.d3-score-stack .axes line {
|
| 18 |
+
stroke: var(--axis-color, var(--text-color));
|
| 19 |
+
}
|
| 20 |
+
|
| 21 |
+
.d3-score-stack .axes text {
|
| 22 |
+
fill: var(--tick-color, var(--muted-color));
|
| 23 |
+
font-size: 11px;
|
| 24 |
+
}
|
| 25 |
+
|
| 26 |
+
.d3-score-stack .grid line {
|
| 27 |
+
stroke: var(--grid-color, rgba(0,0,0,.08));
|
| 28 |
+
}
|
| 29 |
+
|
| 30 |
+
.d3-score-stack .axes text.axis-label {
|
| 31 |
+
font-size: 15px;
|
| 32 |
+
font-weight: 500;
|
| 33 |
+
fill: var(--text-color);
|
| 34 |
+
}
|
| 35 |
+
|
| 36 |
+
.d3-score-stack .bar-segment {
|
| 37 |
+
cursor: pointer;
|
| 38 |
+
transition: opacity 0.15s ease;
|
| 39 |
+
}
|
| 40 |
+
|
| 41 |
+
.d3-score-stack .bar-segment:hover {
|
| 42 |
+
opacity: 0.8;
|
| 43 |
+
}
|
| 44 |
+
|
| 45 |
+
.d3-score-stack .model-label {
|
| 46 |
+
font-size: 12px;
|
| 47 |
+
fill: var(--text-color);
|
| 48 |
+
}
|
| 49 |
+
|
| 50 |
+
.d3-score-stack .d3-tooltip {
|
| 51 |
+
position: absolute;
|
| 52 |
+
top: 0;
|
| 53 |
+
left: 0;
|
| 54 |
+
transform: translate(-9999px, -9999px);
|
| 55 |
+
pointer-events: none;
|
| 56 |
+
padding: 10px 12px;
|
| 57 |
+
border-radius: 8px;
|
| 58 |
+
font-size: 12px;
|
| 59 |
+
line-height: 1.4;
|
| 60 |
+
border: 1px solid var(--border-color);
|
| 61 |
+
background: var(--surface-bg);
|
| 62 |
+
color: var(--text-color);
|
| 63 |
+
box-shadow: 0 4px 24px rgba(0,0,0,.18);
|
| 64 |
+
opacity: 0;
|
| 65 |
+
transition: opacity 0.12s ease;
|
| 66 |
+
z-index: 10;
|
| 67 |
+
}
|
| 68 |
+
|
| 69 |
+
.d3-score-stack .d3-tooltip .model-name {
|
| 70 |
+
font-weight: 600;
|
| 71 |
+
margin-bottom: 4px;
|
| 72 |
+
}
|
| 73 |
+
|
| 74 |
+
.d3-score-stack .d3-tooltip .metric {
|
| 75 |
+
display: flex;
|
| 76 |
+
justify-content: space-between;
|
| 77 |
+
gap: 16px;
|
| 78 |
+
}
|
| 79 |
+
|
| 80 |
+
.d3-score-stack .d3-tooltip .metric-label {
|
| 81 |
+
color: var(--muted-color);
|
| 82 |
+
}
|
| 83 |
+
|
| 84 |
+
.d3-score-stack .d3-tooltip .metric-value {
|
| 85 |
+
font-weight: 500;
|
| 86 |
+
}
|
| 87 |
+
|
| 88 |
+
.d3-score-stack .legend {
|
| 89 |
+
display: flex;
|
| 90 |
+
flex-wrap: wrap;
|
| 91 |
+
justify-content: center;
|
| 92 |
+
gap: 16px;
|
| 93 |
+
margin-top: 12px;
|
| 94 |
+
font-size: 12px;
|
| 95 |
+
}
|
| 96 |
+
|
| 97 |
+
.d3-score-stack .legend-item {
|
| 98 |
+
display: flex;
|
| 99 |
+
align-items: center;
|
| 100 |
+
gap: 6px;
|
| 101 |
+
}
|
| 102 |
+
|
| 103 |
+
.d3-score-stack .legend-swatch {
|
| 104 |
+
width: 14px;
|
| 105 |
+
height: 14px;
|
| 106 |
+
border-radius: 2px;
|
| 107 |
+
}
|
| 108 |
+
|
| 109 |
+
.d3-score-stack .legend-label {
|
| 110 |
+
color: var(--text-color);
|
| 111 |
+
}
|
| 112 |
+
</style>
|
| 113 |
+
<script>
|
| 114 |
+
(() => {
|
| 115 |
+
const ensureD3 = (cb) => {
|
| 116 |
+
if (window.d3 && typeof window.d3.select === 'function') return cb();
|
| 117 |
+
let s = document.getElementById('d3-cdn-script');
|
| 118 |
+
if (!s) {
|
| 119 |
+
s = document.createElement('script');
|
| 120 |
+
s.id = 'd3-cdn-script';
|
| 121 |
+
s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
|
| 122 |
+
document.head.appendChild(s);
|
| 123 |
+
}
|
| 124 |
+
const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
|
| 125 |
+
s.addEventListener('load', onReady, { once: true });
|
| 126 |
+
if (window.d3) onReady();
|
| 127 |
+
};
|
| 128 |
+
|
| 129 |
+
const bootstrap = () => {
|
| 130 |
+
const scriptEl = document.currentScript;
|
| 131 |
+
let container = scriptEl ? scriptEl.previousElementSibling : null;
|
| 132 |
+
if (!(container && container.classList && container.classList.contains('d3-score-stack'))) {
|
| 133 |
+
const candidates = Array.from(document.querySelectorAll('.d3-score-stack'))
|
| 134 |
+
.filter((el) => !(el.dataset && el.dataset.mounted === 'true'));
|
| 135 |
+
container = candidates[candidates.length - 1] || null;
|
| 136 |
+
}
|
| 137 |
+
if (!container) return;
|
| 138 |
+
if (container.dataset) {
|
| 139 |
+
if (container.dataset.mounted === 'true') return;
|
| 140 |
+
container.dataset.mounted = 'true';
|
| 141 |
+
}
|
| 142 |
+
|
| 143 |
+
// Tooltip setup
|
| 144 |
+
container.style.position = container.style.position || 'relative';
|
| 145 |
+
const tip = document.createElement('div');
|
| 146 |
+
tip.className = 'd3-tooltip';
|
| 147 |
+
container.appendChild(tip);
|
| 148 |
+
|
| 149 |
+
// SVG setup
|
| 150 |
+
const svg = d3.select(container).append('svg');
|
| 151 |
+
const gRoot = svg.append('g');
|
| 152 |
+
|
| 153 |
+
// Chart groups
|
| 154 |
+
const gGrid = gRoot.append('g').attr('class', 'grid');
|
| 155 |
+
const gAxes = gRoot.append('g').attr('class', 'axes');
|
| 156 |
+
const gBars = gRoot.append('g').attr('class', 'bars');
|
| 157 |
+
|
| 158 |
+
// Legend container
|
| 159 |
+
const legendDiv = document.createElement('div');
|
| 160 |
+
legendDiv.className = 'legend';
|
| 161 |
+
container.appendChild(legendDiv);
|
| 162 |
+
|
| 163 |
+
// State
|
| 164 |
+
let data = null;
|
| 165 |
+
let width = 800;
|
| 166 |
+
let height = 500;
|
| 167 |
+
const margin = { top: 20, right: 30, bottom: 56, left: 160 };
|
| 168 |
+
|
| 169 |
+
// Colors for segments
|
| 170 |
+
const segmentColors = {
|
| 171 |
+
raw: '#4A90D9', // Blue - raw score
|
| 172 |
+
floored: '#E8973E', // Orange - flooring gain
|
| 173 |
+
noStakes: '#5AAA5A' // Green - no-stakes gain
|
| 174 |
+
};
|
| 175 |
+
|
| 176 |
+
// Scales
|
| 177 |
+
const xScale = d3.scaleLinear();
|
| 178 |
+
const yScale = d3.scaleBand();
|
| 179 |
+
|
| 180 |
+
// Data loading
|
| 181 |
+
const DATA_URL = '/data/score_stack.json';
|
| 182 |
+
|
| 183 |
+
function updateSize() {
|
| 184 |
+
width = container.clientWidth || 800;
|
| 185 |
+
const barCount = data ? data.models.length : 10;
|
| 186 |
+
height = Math.max(400, barCount * 44 + margin.top + margin.bottom);
|
| 187 |
+
svg.attr('width', width).attr('height', height).attr('viewBox', `0 0 ${width} ${height}`);
|
| 188 |
+
gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
|
| 189 |
+
return {
|
| 190 |
+
innerWidth: width - margin.left - margin.right,
|
| 191 |
+
innerHeight: height - margin.top - margin.bottom
|
| 192 |
+
};
|
| 193 |
+
}
|
| 194 |
+
|
| 195 |
+
function showTooltip(event, d, segment) {
|
| 196 |
+
const rect = container.getBoundingClientRect();
|
| 197 |
+
const x = event.clientX - rect.left;
|
| 198 |
+
const y = event.clientY - rect.top;
|
| 199 |
+
|
| 200 |
+
let segmentName, segmentValue, description;
|
| 201 |
+
if (segment === 'raw') {
|
| 202 |
+
segmentName = 'Raw Score';
|
| 203 |
+
segmentValue = d.avg_score.toFixed(2);
|
| 204 |
+
description = 'Standard scoring: 30 - turns - 2×wrong guesses';
|
| 205 |
+
} else if (segment === 'floored') {
|
| 206 |
+
segmentName = 'Flooring Gain';
|
| 207 |
+
segmentValue = '+' + d.floored_delta.toFixed(2);
|
| 208 |
+
description = 'Gain if negative scores count as 0';
|
| 209 |
+
} else {
|
| 210 |
+
segmentName = 'No-Stakes Gain';
|
| 211 |
+
segmentValue = '+' + d.no_stakes_delta.toFixed(2);
|
| 212 |
+
description = 'Additional gain without guess penalties';
|
| 213 |
+
}
|
| 214 |
+
|
| 215 |
+
tip.innerHTML = `
|
| 216 |
+
<div class="model-name" style="color: ${d.color}">${d.name}</div>
|
| 217 |
+
<div class="metric">
|
| 218 |
+
<span class="metric-label">${segmentName}:</span>
|
| 219 |
+
<span class="metric-value">${segmentValue}</span>
|
| 220 |
+
</div>
|
| 221 |
+
<div style="font-size: 11px; color: var(--muted-color); margin-top: 4px;">${description}</div>
|
| 222 |
+
<hr style="border: none; border-top: 1px solid var(--border-color); margin: 8px 0;">
|
| 223 |
+
<div class="metric">
|
| 224 |
+
<span class="metric-label">Raw Score:</span>
|
| 225 |
+
<span class="metric-value">${d.avg_score.toFixed(2)}</span>
|
| 226 |
+
</div>
|
| 227 |
+
<div class="metric">
|
| 228 |
+
<span class="metric-label">Floored Score:</span>
|
| 229 |
+
<span class="metric-value">${d.avg_floored_score.toFixed(2)}</span>
|
| 230 |
+
</div>
|
| 231 |
+
<div class="metric">
|
| 232 |
+
<span class="metric-label">No-Stakes Score:</span>
|
| 233 |
+
<span class="metric-value">${d.avg_no_stakes_score.toFixed(2)}</span>
|
| 234 |
+
</div>
|
| 235 |
+
`;
|
| 236 |
+
|
| 237 |
+
const tipWidth = tip.offsetWidth || 200;
|
| 238 |
+
const tipHeight = tip.offsetHeight || 150;
|
| 239 |
+
let tipX = x + 12;
|
| 240 |
+
let tipY = y - tipHeight / 2;
|
| 241 |
+
|
| 242 |
+
if (tipX + tipWidth > width) tipX = x - tipWidth - 12;
|
| 243 |
+
if (tipY < 0) tipY = 8;
|
| 244 |
+
if (tipY + tipHeight > height) tipY = height - tipHeight - 8;
|
| 245 |
+
|
| 246 |
+
tip.style.transform = `translate(${tipX}px, ${tipY}px)`;
|
| 247 |
+
tip.style.opacity = '1';
|
| 248 |
+
}
|
| 249 |
+
|
| 250 |
+
function hideTooltip() {
|
| 251 |
+
tip.style.opacity = '0';
|
| 252 |
+
tip.style.transform = 'translate(-9999px, -9999px)';
|
| 253 |
+
}
|
| 254 |
+
|
| 255 |
+
function render() {
|
| 256 |
+
if (!data) return;
|
| 257 |
+
|
| 258 |
+
const { innerWidth, innerHeight } = updateSize();
|
| 259 |
+
|
| 260 |
+
// Sort models by raw score (descending)
|
| 261 |
+
const models = [...data.models].sort((a, b) => b.avg_score - a.avg_score);
|
| 262 |
+
|
| 263 |
+
// Update scales
|
| 264 |
+
const maxScore = d3.max(models, d => d.avg_no_stakes_score);
|
| 265 |
+
|
| 266 |
+
xScale
|
| 267 |
+
.domain([0, maxScore + 1])
|
| 268 |
+
.range([0, innerWidth])
|
| 269 |
+
.nice();
|
| 270 |
+
|
| 271 |
+
yScale
|
| 272 |
+
.domain(models.map(d => d.name))
|
| 273 |
+
.range([0, innerHeight])
|
| 274 |
+
.padding(0.25);
|
| 275 |
+
|
| 276 |
+
// Grid lines
|
| 277 |
+
const xTicks = xScale.ticks(8);
|
| 278 |
+
|
| 279 |
+
gGrid.selectAll('.grid-x')
|
| 280 |
+
.data(xTicks)
|
| 281 |
+
.join('line')
|
| 282 |
+
.attr('class', 'grid-x')
|
| 283 |
+
.attr('x1', d => xScale(d))
|
| 284 |
+
.attr('x2', d => xScale(d))
|
| 285 |
+
.attr('y1', 0)
|
| 286 |
+
.attr('y2', innerHeight);
|
| 287 |
+
|
| 288 |
+
// Axes
|
| 289 |
+
const tickSize = 6;
|
| 290 |
+
gAxes.selectAll('.x-axis')
|
| 291 |
+
.data([0])
|
| 292 |
+
.join('g')
|
| 293 |
+
.attr('class', 'x-axis')
|
| 294 |
+
.attr('transform', `translate(0,${innerHeight})`)
|
| 295 |
+
.call(d3.axisBottom(xScale).ticks(8).tickSizeInner(-tickSize).tickSizeOuter(0));
|
| 296 |
+
|
| 297 |
+
gAxes.selectAll('.y-axis')
|
| 298 |
+
.data([0])
|
| 299 |
+
.join('g')
|
| 300 |
+
.attr('class', 'y-axis')
|
| 301 |
+
.call(d3.axisLeft(yScale).tickSize(0))
|
| 302 |
+
.selectAll('text')
|
| 303 |
+
.attr('class', 'model-label');
|
| 304 |
+
|
| 305 |
+
// Axis label
|
| 306 |
+
gAxes.selectAll('.x-label')
|
| 307 |
+
.data([0])
|
| 308 |
+
.join('text')
|
| 309 |
+
.attr('class', 'x-label axis-label')
|
| 310 |
+
.attr('x', innerWidth / 2)
|
| 311 |
+
.attr('y', innerHeight + 44)
|
| 312 |
+
.attr('text-anchor', 'middle')
|
| 313 |
+
.text('Score');
|
| 314 |
+
|
| 315 |
+
const barHeight = yScale.bandwidth();
|
| 316 |
+
|
| 317 |
+
// Helper to sanitize names for CSS selectors (remove periods, spaces, etc.)
|
| 318 |
+
const toClassName = (name) => name.replace(/[^a-zA-Z0-9]/g, '-');
|
| 319 |
+
|
| 320 |
+
// Draw stacked bars for each model
|
| 321 |
+
models.forEach(d => {
|
| 322 |
+
const y = yScale(d.name);
|
| 323 |
+
const safeId = toClassName(d.name);
|
| 324 |
+
|
| 325 |
+
// Calculate segment positions
|
| 326 |
+
// Raw score starts from 0, clamp negative scores to 0
|
| 327 |
+
const rawStart = 0;
|
| 328 |
+
const rawEnd = Math.max(0, d.avg_score);
|
| 329 |
+
|
| 330 |
+
// Floored delta starts where raw score ends (if positive) or at 0 (if raw was negative)
|
| 331 |
+
const flooredStart = rawEnd;
|
| 332 |
+
const flooredEnd = flooredStart + d.floored_delta;
|
| 333 |
+
|
| 334 |
+
// No-stakes delta starts where floored ends
|
| 335 |
+
const noStakesStart = flooredEnd;
|
| 336 |
+
const noStakesEnd = noStakesStart + d.no_stakes_delta;
|
| 337 |
+
|
| 338 |
+
// Raw score segment
|
| 339 |
+
gBars.selectAll(`.bar-raw-${safeId}`)
|
| 340 |
+
.data([d])
|
| 341 |
+
.join('rect')
|
| 342 |
+
.attr('class', `bar-segment bar-raw-${safeId}`)
|
| 343 |
+
.attr('x', xScale(rawStart))
|
| 344 |
+
.attr('y', y)
|
| 345 |
+
.attr('width', Math.max(0, xScale(rawEnd) - xScale(rawStart)))
|
| 346 |
+
.attr('height', barHeight)
|
| 347 |
+
.attr('fill', segmentColors.raw)
|
| 348 |
+
.on('mouseenter', (e) => showTooltip(e, d, 'raw'))
|
| 349 |
+
.on('mousemove', (e) => showTooltip(e, d, 'raw'))
|
| 350 |
+
.on('mouseleave', hideTooltip);
|
| 351 |
+
|
| 352 |
+
// Floored delta segment (only if positive)
|
| 353 |
+
if (d.floored_delta > 0.01) {
|
| 354 |
+
gBars.selectAll(`.bar-floored-${safeId}`)
|
| 355 |
+
.data([d])
|
| 356 |
+
.join('rect')
|
| 357 |
+
.attr('class', `bar-segment bar-floored-${safeId}`)
|
| 358 |
+
.attr('x', xScale(flooredStart))
|
| 359 |
+
.attr('y', y)
|
| 360 |
+
.attr('width', Math.max(0, xScale(flooredEnd) - xScale(flooredStart)))
|
| 361 |
+
.attr('height', barHeight)
|
| 362 |
+
.attr('fill', segmentColors.floored)
|
| 363 |
+
.attr('opacity', 0.5)
|
| 364 |
+
.on('mouseenter', (e) => showTooltip(e, d, 'floored'))
|
| 365 |
+
.on('mousemove', (e) => showTooltip(e, d, 'floored'))
|
| 366 |
+
.on('mouseleave', hideTooltip);
|
| 367 |
+
}
|
| 368 |
+
|
| 369 |
+
// No-stakes delta segment (only if positive)
|
| 370 |
+
if (d.no_stakes_delta > 0.01) {
|
| 371 |
+
gBars.selectAll(`.bar-nostakes-${safeId}`)
|
| 372 |
+
.data([d])
|
| 373 |
+
.join('rect')
|
| 374 |
+
.attr('class', `bar-segment bar-nostakes-${safeId}`)
|
| 375 |
+
.attr('x', xScale(noStakesStart))
|
| 376 |
+
.attr('y', y)
|
| 377 |
+
.attr('width', Math.max(0, xScale(noStakesEnd) - xScale(noStakesStart)))
|
| 378 |
+
.attr('height', barHeight)
|
| 379 |
+
.attr('fill', segmentColors.noStakes)
|
| 380 |
+
.attr('opacity', 0.5)
|
| 381 |
+
.on('mouseenter', (e) => showTooltip(e, d, 'noStakes'))
|
| 382 |
+
.on('mousemove', (e) => showTooltip(e, d, 'noStakes'))
|
| 383 |
+
.on('mouseleave', hideTooltip);
|
| 384 |
+
}
|
| 385 |
+
});
|
| 386 |
+
|
| 387 |
+
// Update legend
|
| 388 |
+
legendDiv.innerHTML = `
|
| 389 |
+
<div class="legend-item">
|
| 390 |
+
<div class="legend-swatch" style="background: ${segmentColors.raw}"></div>
|
| 391 |
+
<span class="legend-label">Raw Score</span>
|
| 392 |
+
</div>
|
| 393 |
+
<div class="legend-item">
|
| 394 |
+
<div class="legend-swatch" style="background: ${segmentColors.floored}"></div>
|
| 395 |
+
<span class="legend-label">Flooring Gain</span>
|
| 396 |
+
</div>
|
| 397 |
+
<div class="legend-item">
|
| 398 |
+
<div class="legend-swatch" style="background: ${segmentColors.noStakes}"></div>
|
| 399 |
+
<span class="legend-label">No-Stakes Gain</span>
|
| 400 |
+
</div>
|
| 401 |
+
`;
|
| 402 |
+
}
|
| 403 |
+
|
| 404 |
+
// Initialize
|
| 405 |
+
fetch(DATA_URL, { cache: 'no-cache' })
|
| 406 |
+
.then(r => r.json())
|
| 407 |
+
.then(json => {
|
| 408 |
+
data = json;
|
| 409 |
+
render();
|
| 410 |
+
})
|
| 411 |
+
.catch(err => {
|
| 412 |
+
const pre = document.createElement('pre');
|
| 413 |
+
pre.style.color = 'red';
|
| 414 |
+
pre.style.padding = '16px';
|
| 415 |
+
pre.textContent = `Error loading data: ${err.message}`;
|
| 416 |
+
container.appendChild(pre);
|
| 417 |
+
});
|
| 418 |
+
|
| 419 |
+
// Resize handling
|
| 420 |
+
if (window.ResizeObserver) {
|
| 421 |
+
new ResizeObserver(() => render()).observe(container);
|
| 422 |
+
} else {
|
| 423 |
+
window.addEventListener('resize', render);
|
| 424 |
+
}
|
| 425 |
+
|
| 426 |
+
// Theme change handling
|
| 427 |
+
const observer = new MutationObserver(() => render());
|
| 428 |
+
observer.observe(document.documentElement, {
|
| 429 |
+
attributes: true,
|
| 430 |
+
attributeFilter: ['data-theme']
|
| 431 |
+
});
|
| 432 |
+
};
|
| 433 |
+
|
| 434 |
+
if (document.readyState === 'loading') {
|
| 435 |
+
document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
|
| 436 |
+
} else {
|
| 437 |
+
ensureD3(bootstrap);
|
| 438 |
+
}
|
| 439 |
+
})();
|
| 440 |
+
</script>
|
app/src/content/embeds/score-vs-failed-guesses.html
ADDED
|
@@ -0,0 +1,369 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div class="d3-score-vs-failed-guesses"></div>
|
| 2 |
+
<style>
|
| 3 |
+
.d3-score-vs-failed-guesses {
|
| 4 |
+
width: 100%;
|
| 5 |
+
margin: 10px 0;
|
| 6 |
+
position: relative;
|
| 7 |
+
font-family: system-ui, -apple-system, sans-serif;
|
| 8 |
+
}
|
| 9 |
+
|
| 10 |
+
.d3-score-vs-failed-guesses svg {
|
| 11 |
+
display: block;
|
| 12 |
+
width: 100%;
|
| 13 |
+
height: auto;
|
| 14 |
+
}
|
| 15 |
+
|
| 16 |
+
.d3-score-vs-failed-guesses .axes path,
|
| 17 |
+
.d3-score-vs-failed-guesses .axes line {
|
| 18 |
+
stroke: var(--axis-color, var(--text-color));
|
| 19 |
+
}
|
| 20 |
+
|
| 21 |
+
.d3-score-vs-failed-guesses .axes text {
|
| 22 |
+
fill: var(--tick-color, var(--muted-color));
|
| 23 |
+
font-size: 11px;
|
| 24 |
+
}
|
| 25 |
+
|
| 26 |
+
.d3-score-vs-failed-guesses .grid line {
|
| 27 |
+
stroke: var(--grid-color, rgba(0,0,0,.08));
|
| 28 |
+
}
|
| 29 |
+
|
| 30 |
+
.d3-score-vs-failed-guesses .axes text.axis-label {
|
| 31 |
+
font-size: 15px;
|
| 32 |
+
font-weight: 500;
|
| 33 |
+
fill: var(--text-color);
|
| 34 |
+
}
|
| 35 |
+
|
| 36 |
+
.d3-score-vs-failed-guesses .x-axis text {
|
| 37 |
+
transform: translateY(4px);
|
| 38 |
+
}
|
| 39 |
+
|
| 40 |
+
.d3-score-vs-failed-guesses .point {
|
| 41 |
+
cursor: pointer;
|
| 42 |
+
transition: opacity 0.15s ease;
|
| 43 |
+
}
|
| 44 |
+
|
| 45 |
+
.d3-score-vs-failed-guesses .point:hover {
|
| 46 |
+
opacity: 0.8;
|
| 47 |
+
}
|
| 48 |
+
|
| 49 |
+
.d3-score-vs-failed-guesses .point-label {
|
| 50 |
+
font-size: 11px;
|
| 51 |
+
fill: var(--text-color);
|
| 52 |
+
pointer-events: none;
|
| 53 |
+
}
|
| 54 |
+
|
| 55 |
+
.d3-score-vs-failed-guesses .d3-tooltip {
|
| 56 |
+
position: absolute;
|
| 57 |
+
top: 0;
|
| 58 |
+
left: 0;
|
| 59 |
+
transform: translate(-9999px, -9999px);
|
| 60 |
+
pointer-events: none;
|
| 61 |
+
padding: 10px 12px;
|
| 62 |
+
border-radius: 8px;
|
| 63 |
+
font-size: 12px;
|
| 64 |
+
line-height: 1.4;
|
| 65 |
+
border: 1px solid var(--border-color);
|
| 66 |
+
background: var(--surface-bg);
|
| 67 |
+
color: var(--text-color);
|
| 68 |
+
box-shadow: 0 4px 24px rgba(0,0,0,.18);
|
| 69 |
+
opacity: 0;
|
| 70 |
+
transition: opacity 0.12s ease;
|
| 71 |
+
z-index: 10;
|
| 72 |
+
}
|
| 73 |
+
|
| 74 |
+
.d3-score-vs-failed-guesses .d3-tooltip .model-name {
|
| 75 |
+
font-weight: 600;
|
| 76 |
+
margin-bottom: 4px;
|
| 77 |
+
}
|
| 78 |
+
|
| 79 |
+
.d3-score-vs-failed-guesses .d3-tooltip .metric {
|
| 80 |
+
display: flex;
|
| 81 |
+
justify-content: space-between;
|
| 82 |
+
gap: 16px;
|
| 83 |
+
}
|
| 84 |
+
|
| 85 |
+
.d3-score-vs-failed-guesses .d3-tooltip .metric-label {
|
| 86 |
+
color: var(--muted-color);
|
| 87 |
+
}
|
| 88 |
+
|
| 89 |
+
.d3-score-vs-failed-guesses .d3-tooltip .metric-value {
|
| 90 |
+
font-weight: 500;
|
| 91 |
+
}
|
| 92 |
+
</style>
|
| 93 |
+
<script>
|
| 94 |
+
(() => {
|
| 95 |
+
const ensureD3 = (cb) => {
|
| 96 |
+
if (window.d3 && typeof window.d3.select === 'function') return cb();
|
| 97 |
+
let s = document.getElementById('d3-cdn-script');
|
| 98 |
+
if (!s) {
|
| 99 |
+
s = document.createElement('script');
|
| 100 |
+
s.id = 'd3-cdn-script';
|
| 101 |
+
s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
|
| 102 |
+
document.head.appendChild(s);
|
| 103 |
+
}
|
| 104 |
+
const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
|
| 105 |
+
s.addEventListener('load', onReady, { once: true });
|
| 106 |
+
if (window.d3) onReady();
|
| 107 |
+
};
|
| 108 |
+
|
| 109 |
+
const bootstrap = () => {
|
| 110 |
+
const scriptEl = document.currentScript;
|
| 111 |
+
let container = scriptEl ? scriptEl.previousElementSibling : null;
|
| 112 |
+
if (!(container && container.classList && container.classList.contains('d3-score-vs-failed-guesses'))) {
|
| 113 |
+
const candidates = Array.from(document.querySelectorAll('.d3-score-vs-failed-guesses'))
|
| 114 |
+
.filter((el) => !(el.dataset && el.dataset.mounted === 'true'));
|
| 115 |
+
container = candidates[candidates.length - 1] || null;
|
| 116 |
+
}
|
| 117 |
+
if (!container) return;
|
| 118 |
+
if (container.dataset) {
|
| 119 |
+
if (container.dataset.mounted === 'true') return;
|
| 120 |
+
container.dataset.mounted = 'true';
|
| 121 |
+
}
|
| 122 |
+
|
| 123 |
+
// Tooltip setup
|
| 124 |
+
container.style.position = container.style.position || 'relative';
|
| 125 |
+
const tip = document.createElement('div');
|
| 126 |
+
tip.className = 'd3-tooltip';
|
| 127 |
+
container.appendChild(tip);
|
| 128 |
+
|
| 129 |
+
// SVG setup
|
| 130 |
+
const svg = d3.select(container).append('svg');
|
| 131 |
+
const gRoot = svg.append('g');
|
| 132 |
+
|
| 133 |
+
// Chart groups
|
| 134 |
+
const gGrid = gRoot.append('g').attr('class', 'grid');
|
| 135 |
+
const gAxes = gRoot.append('g').attr('class', 'axes');
|
| 136 |
+
const gPoints = gRoot.append('g').attr('class', 'points');
|
| 137 |
+
const gLabels = gRoot.append('g').attr('class', 'labels');
|
| 138 |
+
|
| 139 |
+
// State
|
| 140 |
+
let data = null;
|
| 141 |
+
let width = 800;
|
| 142 |
+
let height = 450;
|
| 143 |
+
const margin = { top: 20, right: 120, bottom: 56, left: 72 };
|
| 144 |
+
|
| 145 |
+
// Scales
|
| 146 |
+
const xScale = d3.scaleLinear();
|
| 147 |
+
const yScale = d3.scaleLinear();
|
| 148 |
+
|
| 149 |
+
// Data loading
|
| 150 |
+
const DATA_URL = '/data/score_vs_failed_guesses.json';
|
| 151 |
+
|
| 152 |
+
function updateSize() {
|
| 153 |
+
width = container.clientWidth || 800;
|
| 154 |
+
height = Math.max(300, Math.round(width / 1.3));
|
| 155 |
+
svg.attr('width', width).attr('height', height).attr('viewBox', `0 0 ${width} ${height}`);
|
| 156 |
+
gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
|
| 157 |
+
return {
|
| 158 |
+
innerWidth: width - margin.left - margin.right,
|
| 159 |
+
innerHeight: height - margin.top - margin.bottom
|
| 160 |
+
};
|
| 161 |
+
}
|
| 162 |
+
|
| 163 |
+
function showTooltip(event, d) {
|
| 164 |
+
const rect = container.getBoundingClientRect();
|
| 165 |
+
const x = event.clientX - rect.left;
|
| 166 |
+
const y = event.clientY - rect.top;
|
| 167 |
+
|
| 168 |
+
tip.innerHTML = `
|
| 169 |
+
<div class="model-name" style="color: ${d.color}">${d.name}</div>
|
| 170 |
+
<div class="metric">
|
| 171 |
+
<span class="metric-label">Score:</span>
|
| 172 |
+
<span class="metric-value">${d.avg_score.toFixed(2)}</span>
|
| 173 |
+
</div>
|
| 174 |
+
<div class="metric">
|
| 175 |
+
<span class="metric-label">Failed Guesses:</span>
|
| 176 |
+
<span class="metric-value">${d.avg_failed_guesses.toFixed(2)}</span>
|
| 177 |
+
</div>
|
| 178 |
+
<div class="metric">
|
| 179 |
+
<span class="metric-label">Type:</span>
|
| 180 |
+
<span class="metric-value">${d.is_open ? 'Open' : 'Closed'}</span>
|
| 181 |
+
</div>
|
| 182 |
+
`;
|
| 183 |
+
|
| 184 |
+
const tipWidth = tip.offsetWidth || 150;
|
| 185 |
+
const tipHeight = tip.offsetHeight || 80;
|
| 186 |
+
let tipX = x + 12;
|
| 187 |
+
let tipY = y - tipHeight / 2;
|
| 188 |
+
|
| 189 |
+
if (tipX + tipWidth > width) tipX = x - tipWidth - 12;
|
| 190 |
+
if (tipY < 0) tipY = 8;
|
| 191 |
+
if (tipY + tipHeight > height) tipY = height - tipHeight - 8;
|
| 192 |
+
|
| 193 |
+
tip.style.transform = `translate(${tipX}px, ${tipY}px)`;
|
| 194 |
+
tip.style.opacity = '1';
|
| 195 |
+
}
|
| 196 |
+
|
| 197 |
+
function hideTooltip() {
|
| 198 |
+
tip.style.opacity = '0';
|
| 199 |
+
tip.style.transform = 'translate(-9999px, -9999px)';
|
| 200 |
+
}
|
| 201 |
+
|
| 202 |
+
function render() {
|
| 203 |
+
if (!data) return;
|
| 204 |
+
|
| 205 |
+
const { innerWidth, innerHeight } = updateSize();
|
| 206 |
+
const models = data.models;
|
| 207 |
+
|
| 208 |
+
// Update scales
|
| 209 |
+
const xExtent = d3.extent(models, d => d.avg_failed_guesses);
|
| 210 |
+
const yExtent = d3.extent(models, d => d.avg_score);
|
| 211 |
+
const xPadding = (xExtent[1] - xExtent[0]) * 0.1;
|
| 212 |
+
const yPadding = (yExtent[1] - yExtent[0]) * 0.1;
|
| 213 |
+
|
| 214 |
+
xScale
|
| 215 |
+
.domain([Math.max(0, xExtent[0] - xPadding), xExtent[1] + xPadding])
|
| 216 |
+
.range([0, innerWidth])
|
| 217 |
+
.nice();
|
| 218 |
+
|
| 219 |
+
yScale
|
| 220 |
+
.domain([yExtent[0] - yPadding, yExtent[1] + yPadding])
|
| 221 |
+
.range([innerHeight, 0])
|
| 222 |
+
.nice();
|
| 223 |
+
|
| 224 |
+
// Grid lines
|
| 225 |
+
const xTicks = xScale.ticks(6);
|
| 226 |
+
const yTicks = yScale.ticks(6);
|
| 227 |
+
|
| 228 |
+
gGrid.selectAll('.grid-x')
|
| 229 |
+
.data(xTicks)
|
| 230 |
+
.join('line')
|
| 231 |
+
.attr('class', 'grid-x')
|
| 232 |
+
.attr('x1', d => xScale(d))
|
| 233 |
+
.attr('x2', d => xScale(d))
|
| 234 |
+
.attr('y1', 0)
|
| 235 |
+
.attr('y2', innerHeight);
|
| 236 |
+
|
| 237 |
+
gGrid.selectAll('.grid-y')
|
| 238 |
+
.data(yTicks)
|
| 239 |
+
.join('line')
|
| 240 |
+
.attr('class', 'grid-y')
|
| 241 |
+
.attr('x1', 0)
|
| 242 |
+
.attr('x2', innerWidth)
|
| 243 |
+
.attr('y1', d => yScale(d))
|
| 244 |
+
.attr('y2', d => yScale(d));
|
| 245 |
+
|
| 246 |
+
// Axes with inner ticks
|
| 247 |
+
const tickSize = 6;
|
| 248 |
+
gAxes.selectAll('.x-axis')
|
| 249 |
+
.data([0])
|
| 250 |
+
.join('g')
|
| 251 |
+
.attr('class', 'x-axis')
|
| 252 |
+
.attr('transform', `translate(0,${innerHeight})`)
|
| 253 |
+
.call(d3.axisBottom(xScale).ticks(6).tickSizeInner(-tickSize).tickSizeOuter(0));
|
| 254 |
+
|
| 255 |
+
gAxes.selectAll('.y-axis')
|
| 256 |
+
.data([0])
|
| 257 |
+
.join('g')
|
| 258 |
+
.attr('class', 'y-axis')
|
| 259 |
+
.call(d3.axisLeft(yScale).ticks(6).tickSizeInner(-tickSize).tickSizeOuter(0));
|
| 260 |
+
|
| 261 |
+
// Axis labels
|
| 262 |
+
gAxes.selectAll('.x-label')
|
| 263 |
+
.data([0])
|
| 264 |
+
.join('text')
|
| 265 |
+
.attr('class', 'x-label axis-label')
|
| 266 |
+
.attr('x', innerWidth / 2)
|
| 267 |
+
.attr('y', innerHeight + 44)
|
| 268 |
+
.attr('text-anchor', 'middle')
|
| 269 |
+
.text('Average Failed Guesses');
|
| 270 |
+
|
| 271 |
+
gAxes.selectAll('.y-label')
|
| 272 |
+
.data([0])
|
| 273 |
+
.join('text')
|
| 274 |
+
.attr('class', 'y-label axis-label')
|
| 275 |
+
.attr('x', -innerHeight / 2)
|
| 276 |
+
.attr('y', -52)
|
| 277 |
+
.attr('text-anchor', 'middle')
|
| 278 |
+
.attr('transform', 'rotate(-90)')
|
| 279 |
+
.text('Average Score');
|
| 280 |
+
|
| 281 |
+
// Points - circles for closed models, stars for open models
|
| 282 |
+
const pointRadius = Math.max(8, Math.min(16, innerWidth / 60));
|
| 283 |
+
|
| 284 |
+
// Helper function to create a 5-point star path
|
| 285 |
+
const starPath = (cx, cy, outerR, innerR) => {
|
| 286 |
+
const points = [];
|
| 287 |
+
for (let i = 0; i < 10; i++) {
|
| 288 |
+
const r = i % 2 === 0 ? outerR : innerR;
|
| 289 |
+
const angle = (Math.PI / 2) + (i * Math.PI / 5);
|
| 290 |
+
points.push([cx + r * Math.cos(angle), cy - r * Math.sin(angle)]);
|
| 291 |
+
}
|
| 292 |
+
return 'M' + points.map(p => p.join(',')).join('L') + 'Z';
|
| 293 |
+
};
|
| 294 |
+
|
| 295 |
+
// Closed models as circles
|
| 296 |
+
const closedModels = models.filter(d => !d.is_open);
|
| 297 |
+
gPoints.selectAll('.point-circle')
|
| 298 |
+
.data(closedModels, d => d.name)
|
| 299 |
+
.join('circle')
|
| 300 |
+
.attr('class', 'point point-circle')
|
| 301 |
+
.attr('cx', d => xScale(d.avg_failed_guesses))
|
| 302 |
+
.attr('cy', d => yScale(d.avg_score))
|
| 303 |
+
.attr('r', pointRadius)
|
| 304 |
+
.attr('fill', d => d.color)
|
| 305 |
+
.attr('stroke', 'none')
|
| 306 |
+
.on('mouseenter', showTooltip)
|
| 307 |
+
.on('mousemove', showTooltip)
|
| 308 |
+
.on('mouseleave', hideTooltip);
|
| 309 |
+
|
| 310 |
+
// Open models as stars
|
| 311 |
+
const openModels = models.filter(d => d.is_open);
|
| 312 |
+
gPoints.selectAll('.point-star')
|
| 313 |
+
.data(openModels, d => d.name)
|
| 314 |
+
.join('path')
|
| 315 |
+
.attr('class', 'point point-star')
|
| 316 |
+
.attr('d', d => starPath(xScale(d.avg_failed_guesses), yScale(d.avg_score), pointRadius * 1.2, pointRadius * 0.5))
|
| 317 |
+
.attr('fill', d => d.color)
|
| 318 |
+
.attr('stroke', 'none')
|
| 319 |
+
.on('mouseenter', showTooltip)
|
| 320 |
+
.on('mousemove', showTooltip)
|
| 321 |
+
.on('mouseleave', hideTooltip);
|
| 322 |
+
|
| 323 |
+
// Point labels
|
| 324 |
+
gLabels.selectAll('.point-label')
|
| 325 |
+
.data(models)
|
| 326 |
+
.join('text')
|
| 327 |
+
.attr('class', 'point-label')
|
| 328 |
+
.attr('x', d => xScale(d.avg_failed_guesses) + pointRadius + 6)
|
| 329 |
+
.attr('y', d => yScale(d.avg_score) + 4)
|
| 330 |
+
.text(d => d.name);
|
| 331 |
+
}
|
| 332 |
+
|
| 333 |
+
// Initialize
|
| 334 |
+
fetch(DATA_URL, { cache: 'no-cache' })
|
| 335 |
+
.then(r => r.json())
|
| 336 |
+
.then(json => {
|
| 337 |
+
data = json;
|
| 338 |
+
render();
|
| 339 |
+
})
|
| 340 |
+
.catch(err => {
|
| 341 |
+
const pre = document.createElement('pre');
|
| 342 |
+
pre.style.color = 'red';
|
| 343 |
+
pre.style.padding = '16px';
|
| 344 |
+
pre.textContent = `Error loading data: ${err.message}`;
|
| 345 |
+
container.appendChild(pre);
|
| 346 |
+
});
|
| 347 |
+
|
| 348 |
+
// Resize handling
|
| 349 |
+
if (window.ResizeObserver) {
|
| 350 |
+
new ResizeObserver(() => render()).observe(container);
|
| 351 |
+
} else {
|
| 352 |
+
window.addEventListener('resize', render);
|
| 353 |
+
}
|
| 354 |
+
|
| 355 |
+
// Theme change handling
|
| 356 |
+
const observer = new MutationObserver(() => render());
|
| 357 |
+
observer.observe(document.documentElement, {
|
| 358 |
+
attributes: true,
|
| 359 |
+
attributeFilter: ['data-theme']
|
| 360 |
+
});
|
| 361 |
+
};
|
| 362 |
+
|
| 363 |
+
if (document.readyState === 'loading') {
|
| 364 |
+
document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
|
| 365 |
+
} else {
|
| 366 |
+
ensureD3(bootstrap);
|
| 367 |
+
}
|
| 368 |
+
})();
|
| 369 |
+
</script>
|
dark-mode-image.md
ADDED
|
@@ -0,0 +1,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Dark Mode Image Handling
|
| 2 |
+
|
| 3 |
+
## Problem
|
| 4 |
+
|
| 5 |
+
The blog template automatically inverts image colors in dark mode using a CSS filter:
|
| 6 |
+
|
| 7 |
+
```css
|
| 8 |
+
:global([data-theme="dark"]) .image-wrapper img {
|
| 9 |
+
filter: invert(0.925) hue-rotate(180deg);
|
| 10 |
+
}
|
| 11 |
+
```
|
| 12 |
+
|
| 13 |
+
This works well for charts and figures with white backgrounds, but is undesirable for images that should retain their original colors (e.g., photographs, illustrations with specific color schemes).
|
| 14 |
+
|
| 15 |
+
## Solution
|
| 16 |
+
|
| 17 |
+
Added a `preserveColors` prop to the `Image` component that opts out of the dark mode inversion.
|
| 18 |
+
|
| 19 |
+
### Usage
|
| 20 |
+
|
| 21 |
+
```mdx
|
| 22 |
+
import Image from "../../../components/Image.astro";
|
| 23 |
+
import myImage from "../../assets/image/my_image.png";
|
| 24 |
+
|
| 25 |
+
<Image
|
| 26 |
+
src={myImage}
|
| 27 |
+
alt="Description"
|
| 28 |
+
preserveColors
|
| 29 |
+
/>
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
### Implementation
|
| 33 |
+
|
| 34 |
+
**File: `app/src/components/Image.astro`**
|
| 35 |
+
|
| 36 |
+
1. Added `preserveColors?: boolean` to the Props interface
|
| 37 |
+
2. Added `data-preserve-colors` attribute to the wrapper div when the prop is true
|
| 38 |
+
3. Updated CSS selectors to exclude images with this attribute:
|
| 39 |
+
|
| 40 |
+
```css
|
| 41 |
+
:global([data-theme="dark"]) .image-wrapper:not([data-preserve-colors]) img {
|
| 42 |
+
filter: invert(0.925) hue-rotate(180deg);
|
| 43 |
+
}
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
### Current Usage
|
| 47 |
+
|
| 48 |
+
- `introduction.mdx`: The `example_sequence.png` image uses `preserveColors` to maintain the card colors in dark mode
|
interactive-charts.md
ADDED
|
@@ -0,0 +1,498 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Converting Static Figures to Interactive D3 Charts
|
| 2 |
+
|
| 3 |
+
This guide explains how to convert PNG figures into interactive D3.js visualizations for this project.
|
| 4 |
+
|
| 5 |
+
## Overview
|
| 6 |
+
|
| 7 |
+
Each interactive chart consists of:
|
| 8 |
+
1. **JSON data file** in `app/public/data/` (served at `/data/filename.json`)
|
| 9 |
+
2. **HTML embed file** in `app/src/content/embeds/` (e.g., `chart-name.html`)
|
| 10 |
+
3. **MDX integration** using the `HtmlEmbed` component
|
| 11 |
+
|
| 12 |
+
## File Structure
|
| 13 |
+
|
| 14 |
+
```
|
| 15 |
+
app/
|
| 16 |
+
├── public/data/ # JSON data (served at /data/*)
|
| 17 |
+
│ ├── overall_performance.json
|
| 18 |
+
│ ├── calibration_curves.json
|
| 19 |
+
│ └── ...
|
| 20 |
+
└── src/content/embeds/ # HTML chart implementations
|
| 21 |
+
├── banner.html # Example: scatter plot
|
| 22 |
+
└── calibration-curves.html # (to create)
|
| 23 |
+
```
|
| 24 |
+
|
| 25 |
+
## Step 1: Understand Your Data
|
| 26 |
+
|
| 27 |
+
Check the JSON structure in `app/public/data/`. Common patterns:
|
| 28 |
+
|
| 29 |
+
**Scatter plot** (`overall_performance.json`):
|
| 30 |
+
```json
|
| 31 |
+
{
|
| 32 |
+
"models": [
|
| 33 |
+
{ "name": "Model A", "avg_score": 15.8, "avg_output_tokens_per_turn": 5253, "color": "#FF6B00", "is_open": false }
|
| 34 |
+
]
|
| 35 |
+
}
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
**Line chart / Calibration** (`calibration_curves.json`):
|
| 39 |
+
```json
|
| 40 |
+
{
|
| 41 |
+
"models": [
|
| 42 |
+
{
|
| 43 |
+
"name": "Model A", "color": "#FF6B00",
|
| 44 |
+
"calibration_points": [
|
| 45 |
+
{ "confidence_level": 5, "actual_success_rate": 0.041, "sample_count": 73 }
|
| 46 |
+
]
|
| 47 |
+
}
|
| 48 |
+
]
|
| 49 |
+
}
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
**Histogram** (`confidence_distribution.json`):
|
| 53 |
+
```json
|
| 54 |
+
{
|
| 55 |
+
"models": [
|
| 56 |
+
{
|
| 57 |
+
"name": "Model A", "color": "#FF6B00", "total_guesses": 579,
|
| 58 |
+
"distribution": [
|
| 59 |
+
{ "confidence_level": 5, "proportion": 0.024, "count": 14 }
|
| 60 |
+
]
|
| 61 |
+
}
|
| 62 |
+
]
|
| 63 |
+
}
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
## Step 2: Create the HTML Embed
|
| 67 |
+
|
| 68 |
+
Create a new file in `app/src/content/embeds/`. Use this template:
|
| 69 |
+
|
| 70 |
+
```html
|
| 71 |
+
<div class="d3-CHART-NAME"></div>
|
| 72 |
+
<style>
|
| 73 |
+
/* Scoped styles - prefix everything with .d3-CHART-NAME */
|
| 74 |
+
.d3-CHART-NAME {
|
| 75 |
+
width: 100%;
|
| 76 |
+
margin: 10px 0;
|
| 77 |
+
position: relative;
|
| 78 |
+
font-family: system-ui, -apple-system, sans-serif;
|
| 79 |
+
}
|
| 80 |
+
|
| 81 |
+
.d3-CHART-NAME svg {
|
| 82 |
+
display: block;
|
| 83 |
+
width: 100%;
|
| 84 |
+
height: auto;
|
| 85 |
+
}
|
| 86 |
+
|
| 87 |
+
/* Use CSS variables for theme support */
|
| 88 |
+
.d3-CHART-NAME .axes path,
|
| 89 |
+
.d3-CHART-NAME .axes line {
|
| 90 |
+
stroke: var(--axis-color, var(--text-color));
|
| 91 |
+
}
|
| 92 |
+
|
| 93 |
+
.d3-CHART-NAME .axes text {
|
| 94 |
+
fill: var(--tick-color, var(--muted-color));
|
| 95 |
+
font-size: 11px;
|
| 96 |
+
}
|
| 97 |
+
|
| 98 |
+
.d3-CHART-NAME .grid line {
|
| 99 |
+
stroke: var(--grid-color, rgba(0,0,0,.08));
|
| 100 |
+
}
|
| 101 |
+
|
| 102 |
+
/* Use specific selector to override .axes text */
|
| 103 |
+
.d3-CHART-NAME .axes text.axis-label {
|
| 104 |
+
font-size: 14px;
|
| 105 |
+
font-weight: 500;
|
| 106 |
+
fill: var(--text-color);
|
| 107 |
+
}
|
| 108 |
+
|
| 109 |
+
.d3-CHART-NAME .axes text.chart-title {
|
| 110 |
+
font-size: 16px;
|
| 111 |
+
font-weight: 600;
|
| 112 |
+
fill: var(--text-color);
|
| 113 |
+
}
|
| 114 |
+
|
| 115 |
+
/* Adjust tick label spacing if needed */
|
| 116 |
+
.d3-CHART-NAME .x-axis text {
|
| 117 |
+
transform: translateY(4px);
|
| 118 |
+
}
|
| 119 |
+
|
| 120 |
+
/* Tooltip */
|
| 121 |
+
.d3-CHART-NAME .d3-tooltip {
|
| 122 |
+
position: absolute;
|
| 123 |
+
top: 0; left: 0;
|
| 124 |
+
transform: translate(-9999px, -9999px);
|
| 125 |
+
pointer-events: none;
|
| 126 |
+
padding: 10px 12px;
|
| 127 |
+
border-radius: 8px;
|
| 128 |
+
font-size: 12px;
|
| 129 |
+
line-height: 1.4;
|
| 130 |
+
border: 1px solid var(--border-color);
|
| 131 |
+
background: var(--surface-bg);
|
| 132 |
+
color: var(--text-color);
|
| 133 |
+
box-shadow: 0 4px 24px rgba(0,0,0,.18);
|
| 134 |
+
opacity: 0;
|
| 135 |
+
transition: opacity 0.12s ease;
|
| 136 |
+
z-index: 10;
|
| 137 |
+
}
|
| 138 |
+
</style>
|
| 139 |
+
<script>
|
| 140 |
+
(() => {
|
| 141 |
+
// D3 loader - reuses existing if already loaded
|
| 142 |
+
const ensureD3 = (cb) => {
|
| 143 |
+
if (window.d3 && typeof window.d3.select === 'function') return cb();
|
| 144 |
+
let s = document.getElementById('d3-cdn-script');
|
| 145 |
+
if (!s) {
|
| 146 |
+
s = document.createElement('script');
|
| 147 |
+
s.id = 'd3-cdn-script';
|
| 148 |
+
s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js';
|
| 149 |
+
document.head.appendChild(s);
|
| 150 |
+
}
|
| 151 |
+
const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
|
| 152 |
+
s.addEventListener('load', onReady, { once: true });
|
| 153 |
+
if (window.d3) onReady();
|
| 154 |
+
};
|
| 155 |
+
|
| 156 |
+
const bootstrap = () => {
|
| 157 |
+
// Find container (handles multiple instances)
|
| 158 |
+
const scriptEl = document.currentScript;
|
| 159 |
+
let container = scriptEl ? scriptEl.previousElementSibling : null;
|
| 160 |
+
if (!(container && container.classList && container.classList.contains('d3-CHART-NAME'))) {
|
| 161 |
+
const candidates = Array.from(document.querySelectorAll('.d3-CHART-NAME'))
|
| 162 |
+
.filter((el) => !(el.dataset && el.dataset.mounted === 'true'));
|
| 163 |
+
container = candidates[candidates.length - 1] || null;
|
| 164 |
+
}
|
| 165 |
+
if (!container) return;
|
| 166 |
+
if (container.dataset) {
|
| 167 |
+
if (container.dataset.mounted === 'true') return;
|
| 168 |
+
container.dataset.mounted = 'true';
|
| 169 |
+
}
|
| 170 |
+
|
| 171 |
+
// Tooltip setup
|
| 172 |
+
container.style.position = container.style.position || 'relative';
|
| 173 |
+
const tip = document.createElement('div');
|
| 174 |
+
tip.className = 'd3-tooltip';
|
| 175 |
+
container.appendChild(tip);
|
| 176 |
+
|
| 177 |
+
// SVG setup
|
| 178 |
+
const svg = d3.select(container).append('svg');
|
| 179 |
+
const gRoot = svg.append('g');
|
| 180 |
+
|
| 181 |
+
// Chart groups (order matters for layering)
|
| 182 |
+
const gGrid = gRoot.append('g').attr('class', 'grid');
|
| 183 |
+
const gAxes = gRoot.append('g').attr('class', 'axes');
|
| 184 |
+
const gContent = gRoot.append('g').attr('class', 'content');
|
| 185 |
+
|
| 186 |
+
// State
|
| 187 |
+
let data = null;
|
| 188 |
+
let width = 800;
|
| 189 |
+
let height = 450;
|
| 190 |
+
const margin = { top: 40, right: 120, bottom: 56, left: 72 };
|
| 191 |
+
|
| 192 |
+
// Scales
|
| 193 |
+
const xScale = d3.scaleLinear();
|
| 194 |
+
const yScale = d3.scaleLinear();
|
| 195 |
+
|
| 196 |
+
// Data loading - single path since we use public/data/
|
| 197 |
+
const DATA_URL = '/data/YOUR_DATA_FILE.json';
|
| 198 |
+
|
| 199 |
+
function updateSize() {
|
| 200 |
+
width = container.clientWidth || 800;
|
| 201 |
+
height = Math.max(300, Math.round(width / 1.78)); // 16:9 aspect ratio
|
| 202 |
+
svg.attr('width', width).attr('height', height).attr('viewBox', `0 0 ${width} ${height}`);
|
| 203 |
+
gRoot.attr('transform', `translate(${margin.left},${margin.top})`);
|
| 204 |
+
return {
|
| 205 |
+
innerWidth: width - margin.left - margin.right,
|
| 206 |
+
innerHeight: height - margin.top - margin.bottom
|
| 207 |
+
};
|
| 208 |
+
}
|
| 209 |
+
|
| 210 |
+
function showTooltip(event, d) {
|
| 211 |
+
const rect = container.getBoundingClientRect();
|
| 212 |
+
const x = event.clientX - rect.left;
|
| 213 |
+
const y = event.clientY - rect.top;
|
| 214 |
+
|
| 215 |
+
tip.innerHTML = `
|
| 216 |
+
<div style="font-weight: 600; color: ${d.color}">${d.name}</div>
|
| 217 |
+
<div>Value: ${d.value}</div>
|
| 218 |
+
`;
|
| 219 |
+
|
| 220 |
+
const tipWidth = tip.offsetWidth || 150;
|
| 221 |
+
const tipHeight = tip.offsetHeight || 80;
|
| 222 |
+
let tipX = x + 12;
|
| 223 |
+
let tipY = y - tipHeight / 2;
|
| 224 |
+
|
| 225 |
+
if (tipX + tipWidth > width) tipX = x - tipWidth - 12;
|
| 226 |
+
if (tipY < 0) tipY = 8;
|
| 227 |
+
if (tipY + tipHeight > height) tipY = height - tipHeight - 8;
|
| 228 |
+
|
| 229 |
+
tip.style.transform = `translate(${tipX}px, ${tipY}px)`;
|
| 230 |
+
tip.style.opacity = '1';
|
| 231 |
+
}
|
| 232 |
+
|
| 233 |
+
function hideTooltip() {
|
| 234 |
+
tip.style.opacity = '0';
|
| 235 |
+
tip.style.transform = 'translate(-9999px, -9999px)';
|
| 236 |
+
}
|
| 237 |
+
|
| 238 |
+
function render() {
|
| 239 |
+
if (!data) return;
|
| 240 |
+
const { innerWidth, innerHeight } = updateSize();
|
| 241 |
+
|
| 242 |
+
// TODO: Implement your chart rendering here
|
| 243 |
+
// - Update scales with data extent
|
| 244 |
+
// - Draw grid lines
|
| 245 |
+
// - Draw axes
|
| 246 |
+
// - Draw data elements (lines, bars, points, etc.)
|
| 247 |
+
}
|
| 248 |
+
|
| 249 |
+
// Initialize
|
| 250 |
+
fetch(DATA_URL, { cache: 'no-cache' })
|
| 251 |
+
.then(r => r.json())
|
| 252 |
+
.then(json => {
|
| 253 |
+
data = json;
|
| 254 |
+
render();
|
| 255 |
+
})
|
| 256 |
+
.catch(err => {
|
| 257 |
+
const pre = document.createElement('pre');
|
| 258 |
+
pre.style.color = 'red';
|
| 259 |
+
pre.style.padding = '16px';
|
| 260 |
+
pre.textContent = `Error loading data: ${err.message}`;
|
| 261 |
+
container.appendChild(pre);
|
| 262 |
+
});
|
| 263 |
+
|
| 264 |
+
// Resize handling
|
| 265 |
+
if (window.ResizeObserver) {
|
| 266 |
+
new ResizeObserver(() => render()).observe(container);
|
| 267 |
+
} else {
|
| 268 |
+
window.addEventListener('resize', render);
|
| 269 |
+
}
|
| 270 |
+
|
| 271 |
+
// Theme change handling (re-render on light/dark toggle)
|
| 272 |
+
const observer = new MutationObserver(() => render());
|
| 273 |
+
observer.observe(document.documentElement, {
|
| 274 |
+
attributes: true,
|
| 275 |
+
attributeFilter: ['data-theme']
|
| 276 |
+
});
|
| 277 |
+
};
|
| 278 |
+
|
| 279 |
+
if (document.readyState === 'loading') {
|
| 280 |
+
document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
|
| 281 |
+
} else {
|
| 282 |
+
ensureD3(bootstrap);
|
| 283 |
+
}
|
| 284 |
+
})();
|
| 285 |
+
</script>
|
| 286 |
+
```
|
| 287 |
+
|
| 288 |
+
## Step 3: Key Implementation Details
|
| 289 |
+
|
| 290 |
+
### CSS Variables (Theme Support)
|
| 291 |
+
|
| 292 |
+
Always use CSS variables for colors that need to adapt to light/dark mode:
|
| 293 |
+
|
| 294 |
+
| Variable | Purpose |
|
| 295 |
+
|----------|---------|
|
| 296 |
+
| `var(--text-color)` | Main text, labels |
|
| 297 |
+
| `var(--muted-color)` | Secondary text, tick labels |
|
| 298 |
+
| `var(--border-color)` | Borders, outlines |
|
| 299 |
+
| `var(--surface-bg)` | Tooltip background |
|
| 300 |
+
| `var(--page-bg)` | Page background |
|
| 301 |
+
|
| 302 |
+
### D3 Patterns Used
|
| 303 |
+
|
| 304 |
+
**Scale setup:**
|
| 305 |
+
```javascript
|
| 306 |
+
const xExtent = d3.extent(data, d => d.x);
|
| 307 |
+
const xPadding = (xExtent[1] - xExtent[0]) * 0.1;
|
| 308 |
+
xScale.domain([xExtent[0] - xPadding, xExtent[1] + xPadding])
|
| 309 |
+
.range([0, innerWidth])
|
| 310 |
+
.nice();
|
| 311 |
+
```
|
| 312 |
+
|
| 313 |
+
**Grid lines:**
|
| 314 |
+
```javascript
|
| 315 |
+
gGrid.selectAll('.grid-x')
|
| 316 |
+
.data(xScale.ticks(6))
|
| 317 |
+
.join('line')
|
| 318 |
+
.attr('class', 'grid-x')
|
| 319 |
+
.attr('x1', d => xScale(d))
|
| 320 |
+
.attr('x2', d => xScale(d))
|
| 321 |
+
.attr('y1', 0)
|
| 322 |
+
.attr('y2', innerHeight);
|
| 323 |
+
```
|
| 324 |
+
|
| 325 |
+
**Axes (basic):**
|
| 326 |
+
```javascript
|
| 327 |
+
gAxes.selectAll('.x-axis')
|
| 328 |
+
.data([0])
|
| 329 |
+
.join('g')
|
| 330 |
+
.attr('class', 'x-axis')
|
| 331 |
+
.attr('transform', `translate(0,${innerHeight})`)
|
| 332 |
+
.call(d3.axisBottom(xScale).ticks(6));
|
| 333 |
+
```
|
| 334 |
+
|
| 335 |
+
**Axes with inner ticks:**
|
| 336 |
+
```javascript
|
| 337 |
+
const tickSize = 6;
|
| 338 |
+
gAxes.selectAll('.x-axis')
|
| 339 |
+
.data([0])
|
| 340 |
+
.join('g')
|
| 341 |
+
.attr('class', 'x-axis')
|
| 342 |
+
.attr('transform', `translate(0,${innerHeight})`)
|
| 343 |
+
.call(d3.axisBottom(xScale)
|
| 344 |
+
.ticks(6)
|
| 345 |
+
.tickSizeInner(-tickSize) // Negative = ticks point inward
|
| 346 |
+
.tickSizeOuter(0)); // No outer ticks
|
| 347 |
+
```
|
| 348 |
+
|
| 349 |
+
**Custom shapes (5-point star):**
|
| 350 |
+
```javascript
|
| 351 |
+
const starPath = (cx, cy, outerR, innerR) => {
|
| 352 |
+
const points = [];
|
| 353 |
+
for (let i = 0; i < 10; i++) {
|
| 354 |
+
const r = i % 2 === 0 ? outerR : innerR;
|
| 355 |
+
const angle = (Math.PI / 2) + (i * Math.PI / 5);
|
| 356 |
+
points.push([cx + r * Math.cos(angle), cy - r * Math.sin(angle)]);
|
| 357 |
+
}
|
| 358 |
+
return 'M' + points.map(p => p.join(',')).join('L') + 'Z';
|
| 359 |
+
};
|
| 360 |
+
|
| 361 |
+
// Use with path elements
|
| 362 |
+
gContent.selectAll('.point-star')
|
| 363 |
+
.data(openModels)
|
| 364 |
+
.join('path')
|
| 365 |
+
.attr('d', d => starPath(xScale(d.x), yScale(d.y), radius * 1.2, radius * 0.5))
|
| 366 |
+
.attr('fill', d => d.color);
|
| 367 |
+
```
|
| 368 |
+
|
| 369 |
+
**Data-join for elements:**
|
| 370 |
+
```javascript
|
| 371 |
+
gContent.selectAll('.point')
|
| 372 |
+
.data(models)
|
| 373 |
+
.join('circle')
|
| 374 |
+
.attr('class', 'point')
|
| 375 |
+
.attr('cx', d => xScale(d.x))
|
| 376 |
+
.attr('cy', d => yScale(d.y))
|
| 377 |
+
.attr('r', 8)
|
| 378 |
+
.attr('fill', d => d.color)
|
| 379 |
+
.on('mouseenter', showTooltip)
|
| 380 |
+
.on('mousemove', showTooltip)
|
| 381 |
+
.on('mouseleave', hideTooltip);
|
| 382 |
+
```
|
| 383 |
+
|
| 384 |
+
## Step 4: Integrate in MDX
|
| 385 |
+
|
| 386 |
+
In your `.mdx` file:
|
| 387 |
+
|
| 388 |
+
```mdx
|
| 389 |
+
import HtmlEmbed from "../../../components/HtmlEmbed.astro";
|
| 390 |
+
|
| 391 |
+
<HtmlEmbed
|
| 392 |
+
src="chart-name.html"
|
| 393 |
+
title="Chart Title"
|
| 394 |
+
caption="<strong>Figure N:</strong> Description of what this shows."
|
| 395 |
+
/>
|
| 396 |
+
```
|
| 397 |
+
|
| 398 |
+
For frameless embedding (like the banner):
|
| 399 |
+
```mdx
|
| 400 |
+
<HtmlEmbed src="banner.html" frameless />
|
| 401 |
+
```
|
| 402 |
+
|
| 403 |
+
## Charts to Convert
|
| 404 |
+
|
| 405 |
+
| Figure | Data File | Chart Type | Status |
|
| 406 |
+
|--------|-----------|------------|--------|
|
| 407 |
+
| 1 | `overall_performance.json` | Scatter | Done (banner.html) |
|
| 408 |
+
| 2 | `calibration_curves.json` | Multi-line | Done (calibration-curves.html) |
|
| 409 |
+
| 3 | `confidence_distribution.json` | Grouped histogram | Done (confidence-distribution.html) |
|
| 410 |
+
| 4 | `score_vs_failed_guesses.json` | Scatter | TODO |
|
| 411 |
+
| 5 | `excess_caution.json` | Box plot | TODO |
|
| 412 |
+
| 6 | `caution_vs_failed_guesses.json` | Scatter | Done (caution-vs-failed-guesses.html) |
|
| 413 |
+
| 7 | `by_rule.json` | Strip plot | Done (by-rule.html) |
|
| 414 |
+
| 8 | `complexity_analysis.json` | Heatmap | Done (complexity-analysis.html) |
|
| 415 |
+
|
| 416 |
+
## Testing
|
| 417 |
+
|
| 418 |
+
1. Run dev server: `cd app && npm run dev`
|
| 419 |
+
2. Check the chart loads at the correct URL
|
| 420 |
+
3. Verify tooltip interactions
|
| 421 |
+
4. Toggle light/dark mode to check theme support
|
| 422 |
+
5. Resize the window to verify responsiveness
|
| 423 |
+
|
| 424 |
+
## Debugging Tips
|
| 425 |
+
|
| 426 |
+
- Open browser console to see data loading errors
|
| 427 |
+
- Check Network tab to verify `/data/filename.json` is being fetched
|
| 428 |
+
- If chart doesn't render, check `container.dataset.mounted` isn't already 'true'
|
| 429 |
+
- CSS scoping: always prefix selectors with `.d3-CHART-NAME`
|
| 430 |
+
|
| 431 |
+
## Common Gotchas
|
| 432 |
+
|
| 433 |
+
### Using `.style()` vs `.attr()` for Dynamic Colors
|
| 434 |
+
|
| 435 |
+
When setting fill/stroke colors dynamically in D3 based on data, use `.style()` instead of `.attr()`:
|
| 436 |
+
|
| 437 |
+
```javascript
|
| 438 |
+
// WON'T WORK - attr has lower specificity than CSS rules
|
| 439 |
+
.attr('fill', d => getContrastColor(d.color))
|
| 440 |
+
|
| 441 |
+
// USE THIS - inline styles have higher specificity
|
| 442 |
+
.style('fill', d => getContrastColor(d.color))
|
| 443 |
+
```
|
| 444 |
+
|
| 445 |
+
This is especially important for text labels where you need to calculate contrast colors dynamically. Example contrast function:
|
| 446 |
+
|
| 447 |
+
```javascript
|
| 448 |
+
function getContrastColor(hexColor) {
|
| 449 |
+
const hex = hexColor.replace('#', '');
|
| 450 |
+
const r = parseInt(hex.substr(0, 2), 16) / 255;
|
| 451 |
+
const g = parseInt(hex.substr(2, 2), 16) / 255;
|
| 452 |
+
const b = parseInt(hex.substr(4, 2), 16) / 255;
|
| 453 |
+
const luminance = 0.299 * r + 0.587 * g + 0.114 * b;
|
| 454 |
+
return luminance > 0.5 ? '#000000' : '#ffffff';
|
| 455 |
+
}
|
| 456 |
+
|
| 457 |
+
// Usage
|
| 458 |
+
gLabels.selectAll('.label')
|
| 459 |
+
.data(items)
|
| 460 |
+
.join('text')
|
| 461 |
+
.style('fill', d => getContrastColor(d.color))
|
| 462 |
+
.text(d => d.name);
|
| 463 |
+
```
|
| 464 |
+
|
| 465 |
+
### CSS Specificity for Axis Labels
|
| 466 |
+
|
| 467 |
+
The generic `.axes text` rule applies to ALL text inside the axes group, including axis labels. To style axis labels differently, use a more specific selector:
|
| 468 |
+
|
| 469 |
+
```css
|
| 470 |
+
/* This won't work - gets overridden by .axes text */
|
| 471 |
+
.d3-CHART-NAME .axis-label {
|
| 472 |
+
font-size: 15px;
|
| 473 |
+
}
|
| 474 |
+
|
| 475 |
+
/* Use this instead - more specific */
|
| 476 |
+
.d3-CHART-NAME .axes text.axis-label {
|
| 477 |
+
font-size: 15px;
|
| 478 |
+
font-weight: 500;
|
| 479 |
+
fill: var(--text-color);
|
| 480 |
+
}
|
| 481 |
+
```
|
| 482 |
+
|
| 483 |
+
### Adjusting Tick Label Position
|
| 484 |
+
|
| 485 |
+
To move X-axis tick labels down (add spacing from the axis line):
|
| 486 |
+
|
| 487 |
+
```css
|
| 488 |
+
.d3-CHART-NAME .x-axis text {
|
| 489 |
+
transform: translateY(4px);
|
| 490 |
+
}
|
| 491 |
+
```
|
| 492 |
+
|
| 493 |
+
### Removing Chart Elements
|
| 494 |
+
|
| 495 |
+
When you don't need a title or legend:
|
| 496 |
+
1. Remove the rendering code from `render()`
|
| 497 |
+
2. Remove the CSS styles
|
| 498 |
+
3. Adjust margins accordingly (e.g., reduce `margin.top` if no title)
|