Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
Commit ·
ddd533d
1
Parent(s): 6293309
add compression analysis
Browse files
app/src/content/chapters/analyses.mdx
CHANGED
|
@@ -100,6 +100,23 @@ Different prompt formats produce wildly different output lengths. <FigRef target
|
|
| 100 |
</Wide>
|
| 101 |
|
| 102 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 103 |
### Is More Compute Worth It?
|
| 104 |
|
| 105 |
GPU time across our 65 experiments varies by two orders of magnitude: the cheapest run (Table with SmolLM2) took 8 days, while the most expensive (Guided Rewrite with Gemma-3 27B) consumed over 15 months of GPU time. <FigRef target="cost-efficiency" /> plots each experiment's downstream performance against its GPU cost on a log scale, with a Pareto frontier connecting the most efficient configurations.
|
|
|
|
| 100 |
</Wide>
|
| 101 |
|
| 102 |
|
| 103 |
+
|
| 104 |
+
### Does Compression Predict Performance?
|
| 105 |
+
|
| 106 |
+
Our prompts produce outputs ranging from 25% of the input length (Commentary) to 150% (Guided Rewrite at 12B). Does the degree of compression or expansion matter for downstream performance? <FigRef target="compression-performance" /> plots each experiment's compression ratio against its benchmark score.
|
| 107 |
+
|
| 108 |
+
**There is no meaningful relationship between compression ratio and performance.** Highly compressive prompts (Commentary at 0.26x, Table at 0.25x) and expansive ones (Guided Rewrite at 1.5x) both appear across the full range of performance scores. The best-performing experiments cluster around 0.3x-0.8x compression, but this likely reflects the distribution of prompt types rather than any causal effect of compression itself. FAQ and Tutorial prompts, which happen to compress moderately, also happen to be the strongest prompts for other reasons (pedagogical restructuring, diverse output formats).
|
| 109 |
+
|
| 110 |
+
This means you should not use compression ratio as a proxy for data quality. A prompt that produces concise outputs is not inherently better or worse than one that produces verbose outputs. What matters is the content and structure of the output, not its length relative to the input.
|
| 111 |
+
|
| 112 |
+
<HtmlEmbed
|
| 113 |
+
id="compression-performance"
|
| 114 |
+
src="compression-performance.html"
|
| 115 |
+
data="rephrasing_metadata.json"
|
| 116 |
+
desc="Compression ratio (output/input tokens) vs downstream performance. The dashed line marks ratio = 1.0 (no compression). Hover over points for details."
|
| 117 |
+
/>
|
| 118 |
+
|
| 119 |
+
|
| 120 |
### Is More Compute Worth It?
|
| 121 |
|
| 122 |
GPU time across our 65 experiments varies by two orders of magnitude: the cheapest run (Table with SmolLM2) took 8 days, while the most expensive (Guided Rewrite with Gemma-3 27B) consumed over 15 months of GPU time. <FigRef target="cost-efficiency" /> plots each experiment's downstream performance against its GPU cost on a log scale, with a Pareto frontier connecting the most efficient configurations.
|
app/src/content/embeds/compression-performance.html
ADDED
|
@@ -0,0 +1,375 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<div class="d3-compression-perf" style="width:100%;margin:10px 0;min-height:400px;"></div>
|
| 2 |
+
<style>
|
| 3 |
+
.d3-compression-perf { font-family: system-ui, -apple-system, sans-serif; position: relative; }
|
| 4 |
+
.d3-compression-perf .d3-tooltip {
|
| 5 |
+
position: absolute; top: 0; left: 0;
|
| 6 |
+
transform: translate(-9999px, -9999px);
|
| 7 |
+
pointer-events: none;
|
| 8 |
+
padding: 10px 14px; border-radius: 10px;
|
| 9 |
+
font-size: 13px; line-height: 1.4;
|
| 10 |
+
border: 1px solid var(--border-color);
|
| 11 |
+
background: var(--surface-bg); color: var(--text-color);
|
| 12 |
+
box-shadow: 0 6px 24px rgba(0,0,0,.22);
|
| 13 |
+
opacity: 0; transition: opacity .12s ease;
|
| 14 |
+
z-index: 20; max-width: 340px;
|
| 15 |
+
}
|
| 16 |
+
.d3-compression-perf .controls {
|
| 17 |
+
display: flex; gap: 16px; align-items: center; justify-content: flex-end; flex-wrap: wrap;
|
| 18 |
+
margin-top: 8px;
|
| 19 |
+
}
|
| 20 |
+
.d3-compression-perf .control-group {
|
| 21 |
+
display: flex; flex-direction: column; align-items: flex-start; gap: 4px;
|
| 22 |
+
}
|
| 23 |
+
.d3-compression-perf .controls label {
|
| 24 |
+
font-size: 13px; font-weight: 700; color: var(--text-color);
|
| 25 |
+
}
|
| 26 |
+
.d3-compression-perf .controls select {
|
| 27 |
+
font-size: 13px; padding: 6px 28px 6px 10px; border: 1px solid var(--border-color);
|
| 28 |
+
border-radius: 8px; background: var(--surface-bg); color: var(--text-color);
|
| 29 |
+
appearance: none; cursor: pointer;
|
| 30 |
+
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='12' height='12' viewBox='0 0 12 12'%3E%3Cpath d='M3 5l3 3 3-3' stroke='%23888' stroke-width='1.5' fill='none'/%3E%3C/svg%3E");
|
| 31 |
+
background-repeat: no-repeat; background-position: right 8px center;
|
| 32 |
+
}
|
| 33 |
+
.d3-compression-perf .legend {
|
| 34 |
+
display: flex; flex-direction: column; align-items: flex-start; gap: 6px; margin-top: 8px;
|
| 35 |
+
}
|
| 36 |
+
.d3-compression-perf .legend-title { font-size: 13px; font-weight: 700; color: var(--text-color); }
|
| 37 |
+
.d3-compression-perf .legend .items { display: flex; flex-wrap: wrap; gap: 6px 14px; }
|
| 38 |
+
.d3-compression-perf .legend .item {
|
| 39 |
+
display: inline-flex; align-items: center; gap: 6px; white-space: nowrap;
|
| 40 |
+
font-size: 13px; color: var(--text-color); cursor: pointer;
|
| 41 |
+
}
|
| 42 |
+
.d3-compression-perf .legend .swatch {
|
| 43 |
+
width: 14px; height: 14px; border-radius: 3px; border: 1px solid var(--border-color);
|
| 44 |
+
}
|
| 45 |
+
</style>
|
| 46 |
+
<script>
|
| 47 |
+
(() => {
|
| 48 |
+
const ensureD3 = (cb) => {
|
| 49 |
+
if (window.d3 && typeof window.d3.select === 'function') return cb();
|
| 50 |
+
let s = document.getElementById('d3-cdn-script');
|
| 51 |
+
if (!s) { s = document.createElement('script'); s.id = 'd3-cdn-script'; s.src = 'https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js'; document.head.appendChild(s); }
|
| 52 |
+
const onReady = () => { if (window.d3 && typeof window.d3.select === 'function') cb(); };
|
| 53 |
+
s.addEventListener('load', onReady, { once: true });
|
| 54 |
+
if (window.d3) onReady();
|
| 55 |
+
};
|
| 56 |
+
|
| 57 |
+
const bootstrap = () => {
|
| 58 |
+
const scriptEl = document.currentScript;
|
| 59 |
+
let container = scriptEl ? scriptEl.previousElementSibling : null;
|
| 60 |
+
while (container && !(container.classList && container.classList.contains('d3-compression-perf'))) {
|
| 61 |
+
container = container.previousElementSibling;
|
| 62 |
+
}
|
| 63 |
+
if (!container) {
|
| 64 |
+
const cs = Array.from(document.querySelectorAll('.d3-compression-perf'))
|
| 65 |
+
.filter(el => !(el.dataset && el.dataset.mounted === 'true'));
|
| 66 |
+
container = cs[cs.length - 1] || null;
|
| 67 |
+
}
|
| 68 |
+
if (!container) return;
|
| 69 |
+
if (container.dataset.mounted === 'true') return;
|
| 70 |
+
container.dataset.mounted = 'true';
|
| 71 |
+
|
| 72 |
+
let mountEl = container;
|
| 73 |
+
while (mountEl && !mountEl.getAttribute?.('data-datafiles')) mountEl = mountEl.parentElement;
|
| 74 |
+
const dataAttr = mountEl?.getAttribute?.('data-datafiles');
|
| 75 |
+
const dataPaths = dataAttr
|
| 76 |
+
? [dataAttr.includes('/') ? dataAttr : `/data/${dataAttr}`]
|
| 77 |
+
: ['/data/rephrasing_metadata.json', './assets/data/rephrasing_metadata.json'];
|
| 78 |
+
|
| 79 |
+
const fetchFirst = async (paths) => {
|
| 80 |
+
for (const p of paths) {
|
| 81 |
+
try { const r = await fetch(p, { cache: 'no-cache' }); if (r.ok) return r.json(); } catch(_) {}
|
| 82 |
+
}
|
| 83 |
+
throw new Error('Data not found');
|
| 84 |
+
};
|
| 85 |
+
|
| 86 |
+
fetchFirst(dataPaths).then(data => buildChart(data)).catch(err => {
|
| 87 |
+
container.innerHTML = `<pre style="color:red;padding:12px;">Error loading data: ${err.message}</pre>`;
|
| 88 |
+
});
|
| 89 |
+
|
| 90 |
+
function buildChart(rawData) {
|
| 91 |
+
const SOURCE_MAP = {
|
| 92 |
+
'fineweb-edu-hq-20BT': 'FW-Edu HQ', 'fineweb-edu-lq-20BT': 'FW-Edu LQ',
|
| 93 |
+
'dclm-37BT': 'DCLM', 'cosmopedia-25BT': 'Cosmopedia'
|
| 94 |
+
};
|
| 95 |
+
const PROMPT_LABELS = {
|
| 96 |
+
'article': 'Article', 'commentary': 'Commentary', 'discussion': 'Discussion',
|
| 97 |
+
'faq': 'FAQ', 'math': 'Math', 'table': 'Table', 'tutorial': 'Tutorial',
|
| 98 |
+
'distill': 'Distill', 'diverse_qa_pairs': 'Diverse QA',
|
| 99 |
+
'extract_knowledge': 'Extract Knowledge', 'knowledge_list': 'Knowledge List',
|
| 100 |
+
'wikipedia_style_rephrasing': 'Wikipedia Style',
|
| 101 |
+
'guided_rewrite_improved': 'Guided Rewrite+', 'guided_rewrite_original': 'Guided Rewrite'
|
| 102 |
+
};
|
| 103 |
+
|
| 104 |
+
const getFamily = (m) => {
|
| 105 |
+
const ml = m.toLowerCase();
|
| 106 |
+
if (ml.includes('smollm')) return 'SmolLM2';
|
| 107 |
+
if (ml.includes('gemma')) return 'Gemma';
|
| 108 |
+
if (ml.includes('qwen')) return 'Qwen';
|
| 109 |
+
if (ml.includes('falcon')) return 'Falcon';
|
| 110 |
+
if (ml.includes('granite')) return 'Granite';
|
| 111 |
+
if (ml.includes('llama')) return 'Llama';
|
| 112 |
+
return 'Other';
|
| 113 |
+
};
|
| 114 |
+
|
| 115 |
+
// Color by prompt type
|
| 116 |
+
const allPromptKeys = [...new Set(rawData.map(d => d.prompt.split('/')[1].replace('.md', '')))].sort();
|
| 117 |
+
const promptColors = {};
|
| 118 |
+
const cat = window.ColorPalettes ? window.ColorPalettes.getColors('categorical', allPromptKeys.length) : d3.schemeTableau10.concat(d3.schemePastel1);
|
| 119 |
+
allPromptKeys.forEach((k, i) => { promptColors[PROMPT_LABELS[k] || k] = cat[i % cat.length]; });
|
| 120 |
+
|
| 121 |
+
const METRIC_NAMES = {
|
| 122 |
+
'agg_score_macro': 'Aggregate Score (Macro)',
|
| 123 |
+
'agg_score_micro': 'Aggregate Score (Micro)',
|
| 124 |
+
'agg_score_RC': 'Reading Comprehension',
|
| 125 |
+
'agg_score_GK': 'General Knowledge',
|
| 126 |
+
'agg_score_NLU': 'Natural Language Understanding',
|
| 127 |
+
'agg_score_MATH': 'Math',
|
| 128 |
+
'agg_score_TABLE': 'Table Understanding',
|
| 129 |
+
'agg_score_RES': 'Reasoning',
|
| 130 |
+
'arc_cf:easy': 'ARC-Easy',
|
| 131 |
+
'drop': 'DROP',
|
| 132 |
+
'gsm8k': 'GSM8K',
|
| 133 |
+
'hellaswag_cf': 'HellaSwag',
|
| 134 |
+
'openbookqa_cf': 'OpenBookQA',
|
| 135 |
+
'piqa_cf': 'PIQA',
|
| 136 |
+
'squad_v2': 'SQuAD v2',
|
| 137 |
+
'treb_qa': 'TriviaQA',
|
| 138 |
+
'wikitablequestions': 'WikiTableQuestions',
|
| 139 |
+
'winogrande_cf': 'Winogrande',
|
| 140 |
+
'xcsqa_cf': 'XCSQA',
|
| 141 |
+
'mmlu_redux_cf:_average': 'MMLU Redux'
|
| 142 |
+
};
|
| 143 |
+
const AGG_ORDER = ['agg_score_macro', 'agg_score_micro', 'agg_score_RC', 'agg_score_GK', 'agg_score_NLU', 'agg_score_MATH', 'agg_score_TABLE', 'agg_score_RES'];
|
| 144 |
+
const metricName = (key) => METRIC_NAMES[key] || key;
|
| 145 |
+
|
| 146 |
+
const experiments = rawData.map(d => {
|
| 147 |
+
const [cat, promptFile] = d.prompt.split('/');
|
| 148 |
+
const promptKey = promptFile.replace('.md', '');
|
| 149 |
+
return {
|
| 150 |
+
run: d.run,
|
| 151 |
+
prompt: PROMPT_LABELS[promptKey] || promptKey,
|
| 152 |
+
model: d.model.split('/').pop(),
|
| 153 |
+
source: SOURCE_MAP[d.source_dataset] || d.source_dataset,
|
| 154 |
+
family: getFamily(d.model),
|
| 155 |
+
compressionRatio: d.compression_ratio,
|
| 156 |
+
inputPerDoc: d.input_token_count_mean,
|
| 157 |
+
outputPerDoc: d.output_token_count_mean,
|
| 158 |
+
tokenReduction: d.token_reduction_mean,
|
| 159 |
+
results: d.results
|
| 160 |
+
};
|
| 161 |
+
});
|
| 162 |
+
|
| 163 |
+
// Detect available metrics from first experiment's results
|
| 164 |
+
const allResultKeys = Object.keys(experiments[0].results);
|
| 165 |
+
const aggMetrics = AGG_ORDER.filter(k => allResultKeys.includes(k));
|
| 166 |
+
const indMetrics = allResultKeys.filter(k => !k.startsWith('agg_score'));
|
| 167 |
+
|
| 168 |
+
let currentMetric = aggMetrics[0] || allResultKeys[0];
|
| 169 |
+
|
| 170 |
+
const svg = d3.select(container).append('svg').attr('width', '100%').style('display', 'block');
|
| 171 |
+
const gGrid = svg.append('g');
|
| 172 |
+
const gRef = svg.append('g');
|
| 173 |
+
const gDots = svg.append('g');
|
| 174 |
+
const gAxes = svg.append('g');
|
| 175 |
+
const gAnnot = svg.append('g');
|
| 176 |
+
|
| 177 |
+
let tip = container.querySelector('.d3-tooltip');
|
| 178 |
+
let tipInner;
|
| 179 |
+
if (!tip) {
|
| 180 |
+
tip = document.createElement('div'); tip.className = 'd3-tooltip';
|
| 181 |
+
tipInner = document.createElement('div'); tipInner.className = 'd3-tooltip__inner';
|
| 182 |
+
tipInner.style.textAlign = 'left';
|
| 183 |
+
tip.appendChild(tipInner); container.appendChild(tip);
|
| 184 |
+
} else { tipInner = tip.querySelector('.d3-tooltip__inner') || tip; }
|
| 185 |
+
|
| 186 |
+
const margin = { top: 12, right: 16, bottom: 48, left: 56 };
|
| 187 |
+
|
| 188 |
+
function render() {
|
| 189 |
+
const width = container.clientWidth || 800;
|
| 190 |
+
const height = Math.max(360, Math.round(width / 2.2));
|
| 191 |
+
svg.attr('width', width).attr('height', height);
|
| 192 |
+
const iw = width - margin.left - margin.right;
|
| 193 |
+
const ih = height - margin.top - margin.bottom;
|
| 194 |
+
const metricLabel = metricName(currentMetric);
|
| 195 |
+
|
| 196 |
+
const xExtent = d3.extent(experiments, d => d.compressionRatio);
|
| 197 |
+
const xPad = (xExtent[1] - xExtent[0]) * 0.06;
|
| 198 |
+
const xScale = d3.scaleLinear()
|
| 199 |
+
.domain([xExtent[0] - xPad, xExtent[1] + xPad])
|
| 200 |
+
.range([margin.left, width - margin.right]);
|
| 201 |
+
|
| 202 |
+
const yVals = experiments.map(d => d.results[currentMetric]).filter(v => v != null);
|
| 203 |
+
const yPad = (d3.max(yVals) - d3.min(yVals)) * 0.08;
|
| 204 |
+
const yScale = d3.scaleLinear()
|
| 205 |
+
.domain([d3.min(yVals) - yPad, d3.max(yVals) + yPad])
|
| 206 |
+
.range([height - margin.bottom, margin.top]);
|
| 207 |
+
|
| 208 |
+
// Grid
|
| 209 |
+
const yTicks = yScale.ticks(6);
|
| 210 |
+
gGrid.selectAll('line').data(yTicks).join('line')
|
| 211 |
+
.attr('x1', margin.left).attr('x2', width - margin.right)
|
| 212 |
+
.attr('y1', d => yScale(d)).attr('y2', d => yScale(d))
|
| 213 |
+
.attr('stroke', 'var(--grid-color)').attr('stroke-width', 0.5);
|
| 214 |
+
|
| 215 |
+
// Reference line at compression_ratio = 1.0
|
| 216 |
+
gRef.selectAll('*').remove();
|
| 217 |
+
const x1 = xScale(1.0);
|
| 218 |
+
if (x1 > margin.left && x1 < width - margin.right) {
|
| 219 |
+
gRef.append('line')
|
| 220 |
+
.attr('x1', x1).attr('x2', x1)
|
| 221 |
+
.attr('y1', margin.top).attr('y2', height - margin.bottom)
|
| 222 |
+
.attr('stroke', 'var(--muted-color)').attr('stroke-width', 1)
|
| 223 |
+
.attr('stroke-dasharray', '6,4').attr('opacity', 0.5);
|
| 224 |
+
}
|
| 225 |
+
|
| 226 |
+
// Annotations
|
| 227 |
+
gAnnot.selectAll('*').remove();
|
| 228 |
+
const annotY = margin.top + 14;
|
| 229 |
+
if (x1 > margin.left + 60 && x1 < width - margin.right - 60) {
|
| 230 |
+
gAnnot.append('text')
|
| 231 |
+
.attr('x', x1 - 10).attr('y', annotY)
|
| 232 |
+
.attr('text-anchor', 'end').attr('fill', 'var(--muted-color)')
|
| 233 |
+
.attr('font-size', '12px').attr('font-style', 'italic')
|
| 234 |
+
.text('\u2190 compression');
|
| 235 |
+
gAnnot.append('text')
|
| 236 |
+
.attr('x', x1 + 10).attr('y', annotY)
|
| 237 |
+
.attr('text-anchor', 'start').attr('fill', 'var(--muted-color)')
|
| 238 |
+
.attr('font-size', '12px').attr('font-style', 'italic')
|
| 239 |
+
.text('expansion \u2192');
|
| 240 |
+
}
|
| 241 |
+
|
| 242 |
+
// Axes
|
| 243 |
+
gAxes.selectAll('*').remove();
|
| 244 |
+
gAxes.append('g')
|
| 245 |
+
.attr('transform', `translate(0,${height - margin.bottom})`)
|
| 246 |
+
.call(d3.axisBottom(xScale).ticks(8).tickFormat(d => d.toFixed(2)))
|
| 247 |
+
.call(g => g.select('.domain').attr('stroke', 'var(--axis-color)'))
|
| 248 |
+
.call(g => g.selectAll('.tick line').attr('stroke', 'var(--tick-color)'))
|
| 249 |
+
.call(g => g.selectAll('.tick text').attr('fill', 'var(--tick-color)').attr('font-size', '13px'));
|
| 250 |
+
|
| 251 |
+
gAxes.append('g')
|
| 252 |
+
.attr('transform', `translate(${margin.left},0)`)
|
| 253 |
+
.call(d3.axisLeft(yScale).ticks(6).tickFormat(d3.format('.2f')))
|
| 254 |
+
.call(g => g.select('.domain').attr('stroke', 'var(--axis-color)'))
|
| 255 |
+
.call(g => g.selectAll('.tick line').attr('stroke', 'var(--tick-color)'))
|
| 256 |
+
.call(g => g.selectAll('.tick text').attr('fill', 'var(--tick-color)').attr('font-size', '13px'));
|
| 257 |
+
|
| 258 |
+
gAxes.append('text')
|
| 259 |
+
.attr('x', margin.left + iw / 2).attr('y', height - 4)
|
| 260 |
+
.attr('text-anchor', 'middle').attr('fill', 'var(--text-color)')
|
| 261 |
+
.attr('font-size', '14px').attr('font-weight', '600')
|
| 262 |
+
.text('Compression ratio (output / input tokens)');
|
| 263 |
+
|
| 264 |
+
gAxes.append('text')
|
| 265 |
+
.attr('transform', 'rotate(-90)')
|
| 266 |
+
.attr('x', -(margin.top + ih / 2)).attr('y', 14)
|
| 267 |
+
.attr('text-anchor', 'middle').attr('fill', 'var(--text-color)')
|
| 268 |
+
.attr('font-size', '14px').attr('font-weight', '600')
|
| 269 |
+
.text(metricLabel);
|
| 270 |
+
|
| 271 |
+
// Dots
|
| 272 |
+
const rBase = Math.max(5, Math.min(9, width * 0.008));
|
| 273 |
+
|
| 274 |
+
gDots.selectAll('circle').data(experiments, d => d.run).join('circle')
|
| 275 |
+
.attr('cx', d => xScale(d.compressionRatio))
|
| 276 |
+
.attr('cy', d => yScale(d.results[currentMetric]))
|
| 277 |
+
.attr('r', rBase)
|
| 278 |
+
.attr('fill', d => promptColors[d.prompt] || '#999')
|
| 279 |
+
.attr('fill-opacity', 0.8)
|
| 280 |
+
.attr('stroke', d => promptColors[d.prompt] || '#999')
|
| 281 |
+
.attr('stroke-width', 1.5)
|
| 282 |
+
.attr('stroke-opacity', 0.3)
|
| 283 |
+
.attr('cursor', 'pointer')
|
| 284 |
+
.on('mouseenter', function(ev, d) {
|
| 285 |
+
d3.select(this).attr('r', rBase * 1.6).attr('fill-opacity', 1).attr('stroke-opacity', 0.8);
|
| 286 |
+
gDots.selectAll('circle').filter(c => c !== d)
|
| 287 |
+
.attr('fill-opacity', 0.15).attr('stroke-opacity', 0.08);
|
| 288 |
+
// Highlight same prompt
|
| 289 |
+
gDots.selectAll('circle').filter(c => c.prompt === d.prompt && c !== d)
|
| 290 |
+
.attr('fill-opacity', 0.5).attr('stroke-opacity', 0.4);
|
| 291 |
+
const score = d.results[currentMetric];
|
| 292 |
+
tipInner.innerHTML =
|
| 293 |
+
`<div style="font-weight:700;font-size:14px;margin-bottom:4px;">${d.prompt}</div>` +
|
| 294 |
+
`<div style="font-size:12px;color:var(--muted-color);margin-bottom:6px;">` +
|
| 295 |
+
`${d.model} · ${d.source}</div>` +
|
| 296 |
+
`<div style="display:grid;grid-template-columns:auto 1fr;gap:2px 10px;font-size:13px;">` +
|
| 297 |
+
`<span style="color:var(--muted-color);">Compression</span><span>${d.compressionRatio.toFixed(3)} (${(d.compressionRatio * 100).toFixed(0)}%)</span>` +
|
| 298 |
+
`<span style="color:var(--muted-color);">Input/doc</span><span>${Math.round(d.inputPerDoc)} tokens</span>` +
|
| 299 |
+
`<span style="color:var(--muted-color);">Output/doc</span><span>${Math.round(d.outputPerDoc)} tokens</span>` +
|
| 300 |
+
`<span style="color:var(--muted-color);">Reduction</span><span>${Math.round(d.tokenReduction)} tokens</span>` +
|
| 301 |
+
`<span style="color:var(--muted-color);">${metricLabel}</span><span style="font-weight:700;">${score != null ? score.toFixed(4) : 'N/A'}</span>` +
|
| 302 |
+
`</div>`;
|
| 303 |
+
tip.style.opacity = '1';
|
| 304 |
+
})
|
| 305 |
+
.on('mousemove', (ev) => {
|
| 306 |
+
const [mx, my] = d3.pointer(ev, container);
|
| 307 |
+
const bw = tip.offsetWidth || 280;
|
| 308 |
+
const bh = tip.offsetHeight || 160;
|
| 309 |
+
const ox = (mx + bw + 20 > width) ? -(bw + 12) : 14;
|
| 310 |
+
const oy = (my + bh + 20 > (height + 60)) ? -(bh + 12) : 14;
|
| 311 |
+
tip.style.transform = `translate(${Math.round(mx + ox)}px,${Math.round(my + oy)}px)`;
|
| 312 |
+
})
|
| 313 |
+
.on('mouseleave', function() {
|
| 314 |
+
gDots.selectAll('circle').attr('r', rBase).attr('fill-opacity', 0.8).attr('stroke-opacity', 0.3);
|
| 315 |
+
tip.style.opacity = '0';
|
| 316 |
+
tip.style.transform = 'translate(-9999px,-9999px)';
|
| 317 |
+
});
|
| 318 |
+
}
|
| 319 |
+
|
| 320 |
+
// Controls
|
| 321 |
+
const controls = document.createElement('div'); controls.className = 'controls';
|
| 322 |
+
const cg = document.createElement('div'); cg.className = 'control-group';
|
| 323 |
+
const lbl = document.createElement('label'); lbl.textContent = 'Metric'; lbl.setAttribute('for', 'cp-metric-select');
|
| 324 |
+
const sel = document.createElement('select'); sel.id = 'cp-metric-select';
|
| 325 |
+
const aggGroup = document.createElement('optgroup'); aggGroup.label = 'Aggregate Scores';
|
| 326 |
+
aggMetrics.forEach(k => {
|
| 327 |
+
const opt = document.createElement('option'); opt.value = k; opt.textContent = metricName(k);
|
| 328 |
+
if (k === currentMetric) opt.selected = true;
|
| 329 |
+
aggGroup.appendChild(opt);
|
| 330 |
+
});
|
| 331 |
+
const indGroup = document.createElement('optgroup'); indGroup.label = 'Individual Benchmarks';
|
| 332 |
+
indMetrics.forEach(k => {
|
| 333 |
+
const opt = document.createElement('option'); opt.value = k; opt.textContent = metricName(k);
|
| 334 |
+
if (k === currentMetric) opt.selected = true;
|
| 335 |
+
indGroup.appendChild(opt);
|
| 336 |
+
});
|
| 337 |
+
if (aggGroup.children.length) sel.appendChild(aggGroup);
|
| 338 |
+
if (indGroup.children.length) sel.appendChild(indGroup);
|
| 339 |
+
sel.addEventListener('change', () => { currentMetric = sel.value; render(); });
|
| 340 |
+
cg.appendChild(lbl); cg.appendChild(sel); controls.appendChild(cg); container.appendChild(controls);
|
| 341 |
+
|
| 342 |
+
// Legend
|
| 343 |
+
const legend = document.createElement('div'); legend.className = 'legend';
|
| 344 |
+
const ltitle = document.createElement('div'); ltitle.className = 'legend-title'; ltitle.textContent = 'Legend';
|
| 345 |
+
const items = document.createElement('div'); items.className = 'items';
|
| 346 |
+
const usedPrompts = [...new Set(experiments.map(d => d.prompt))].sort();
|
| 347 |
+
usedPrompts.forEach(p => {
|
| 348 |
+
const el = document.createElement('span'); el.className = 'item';
|
| 349 |
+
const sw = document.createElement('span'); sw.className = 'swatch'; sw.style.background = promptColors[p];
|
| 350 |
+
const txt = document.createElement('span'); txt.textContent = p;
|
| 351 |
+
el.appendChild(sw); el.appendChild(txt); items.appendChild(el);
|
| 352 |
+
el.addEventListener('mouseenter', () => {
|
| 353 |
+
svg.selectAll('circle').attr('fill-opacity', d => d.prompt === p ? 0.9 : 0.1)
|
| 354 |
+
.attr('stroke-opacity', d => d.prompt === p ? 0.6 : 0.05);
|
| 355 |
+
});
|
| 356 |
+
el.addEventListener('mouseleave', () => {
|
| 357 |
+
svg.selectAll('circle').attr('fill-opacity', 0.8).attr('stroke-opacity', 0.3);
|
| 358 |
+
});
|
| 359 |
+
});
|
| 360 |
+
// Reference line legend
|
| 361 |
+
const refItem = document.createElement('span'); refItem.className = 'item';
|
| 362 |
+
refItem.innerHTML = `<svg width="20" height="14" style="vertical-align:middle;"><line x1="0" y1="7" x2="20" y2="7" stroke="var(--muted-color)" stroke-width="1" stroke-dasharray="4,3" opacity="0.5"/></svg><span>ratio = 1.0</span>`;
|
| 363 |
+
items.appendChild(refItem);
|
| 364 |
+
legend.appendChild(ltitle); legend.appendChild(items); container.appendChild(legend);
|
| 365 |
+
|
| 366 |
+
render();
|
| 367 |
+
if (window.ResizeObserver) new ResizeObserver(() => render()).observe(container);
|
| 368 |
+
else window.addEventListener('resize', render);
|
| 369 |
+
}
|
| 370 |
+
};
|
| 371 |
+
|
| 372 |
+
if (document.readyState === 'loading') document.addEventListener('DOMContentLoaded', () => ensureD3(bootstrap), { once: true });
|
| 373 |
+
else ensureD3(bootstrap);
|
| 374 |
+
})();
|
| 375 |
+
</script>
|