nuocuhz Claude Sonnet 4.6 commited on
Commit
918804b
Β·
1 Parent(s): 0ab9a26

Analytics: remove Non-evaluative, trim findings to 3 points, add slogan

Browse files
Files changed (1) hide show
  1. analytics.py +11 -14
analytics.py CHANGED
@@ -18,10 +18,9 @@ DATASETS = {
18
  }
19
 
20
  LABEL_COLORS = {
21
- "System 1": "#ef4444",
22
- "Mixed": "#f59e0b",
23
- "System 2": "#22c55e",
24
- "Non-evaluative": "#94a3b8",
25
  }
26
 
27
  CONF_COLORS = {
@@ -69,7 +68,7 @@ def load_all() -> dict:
69
 
70
  def fig_label_distribution(data: dict) -> go.Figure:
71
  """Grouped bar: label distribution per conference."""
72
- labels_order = ["System 1", "Mixed", "System 2", "Non-evaluative"]
73
  confs = list(data.keys())
74
 
75
  fig = go.Figure()
@@ -271,19 +270,17 @@ def build_summary(data: dict) -> str:
271
 
272
 
273
  FINDINGS = """
274
- ### Key Findings (100 papers Γ— 3 conferences, ~1,150 reviews)
275
 
276
- 1. **Mixed reasoning dominates across all venues (49–57%).** Pure System 1 or System 2 reviews are the minority β€” most reviewers blend intuitive and analytical modes rather than operating at either extreme.
277
 
278
- 2. **ICLR reviewers show more System 1 tendency (35%) than ICML/NeurIPS (~21%).** This may reflect ICLR's open-ended review format, which imposes less structural scaffolding than ICML's field-by-field template β€” less structure β†’ more impression-driven writing.
279
 
280
- 3. **ICML and NeurIPS reviewers show more System 2 tendency (~23–26%) than ICLR (16%).** ICML's structured fields (*Claims and Evidence*, *Theoretical Claims*, *Experimental Designs*) appear to scaffold more explicit, decomposed reasoning.
281
 
282
- 4. **Reasoning quality (RQS) is nearly identical across venues (2.80–2.94 / 5).** Despite different formats and communities, the overall analytical depth of peer review is remarkably uniform β€” suggesting a field-wide ceiling rather than venue-specific culture.
283
 
284
- 5. **Decision tier does not predict review quality.** Oral-paper reviews are not systematically stronger than Poster reviews (differences < 0.2 RQS points). Reviewers do not write more analytically for papers they rate highly.
285
 
286
- 6. **Checklist Inflation is the dominant bias in all three venues** (50–58% of reviews). Reviewers frequently enumerate specific concerns without analytical linkage, prioritization, or core-claim relevance β€” mistaking list length for reasoning depth.
287
-
288
- 7. **Representativeness Heuristic is more prevalent at NeurIPS (27%) than ICLR/ICML (~17–21%).** NeurIPS reviewers more often judge papers by surface similarity to known strong work rather than explicit criteria.
289
  """
 
18
  }
19
 
20
  LABEL_COLORS = {
21
+ "System 1": "#ef4444",
22
+ "Mixed": "#f59e0b",
23
+ "System 2": "#22c55e",
 
24
  }
25
 
26
  CONF_COLORS = {
 
68
 
69
  def fig_label_distribution(data: dict) -> go.Figure:
70
  """Grouped bar: label distribution per conference."""
71
+ labels_order = ["System 1", "Mixed", "System 2"]
72
  confs = list(data.keys())
73
 
74
  fig = go.Figure()
 
270
 
271
 
272
  FINDINGS = """
273
+ ### Key Findings
274
 
275
+ *100 papers Γ— 3 conferences, ~1,150 reviews, rated by claude-sonnet-4-6. Papers sampled by stratified random sampling proportional to acceptance tier (Oral / Spotlight / Poster) within each venue.*
276
 
277
+ 1. **ICML and NeurIPS reviewers show more System 2 tendency (~23–26%) than ICLR (16%).** ICML's structured fields (*Claims and Evidence*, *Theoretical Claims*, *Experimental Designs*) appear to scaffold more explicit, decomposed reasoning.
278
 
279
+ 2. **Despite different formats and communities, the overall analytical depth of peer review is remarkably uniform** (RQS 2.80–2.94 / 5), suggesting a field-wide ceiling rather than venue-specific culture.
280
 
281
+ 3. **Decision tier does not predict review quality.** Oral-paper reviews are not systematically stronger than Poster reviews (differences < 0.2 RQS points). Reviewers do not write more analytically for papers they rate highly.
282
 
283
+ ---
284
 
285
+ > *We are not against AI review. We are against flawed reasoning behind review.*
 
 
286
  """