Spaces:
Sleeping
Sleeping
Analytics: remove Non-evaluative, trim findings to 3 points, add slogan
Browse files- analytics.py +11 -14
analytics.py
CHANGED
|
@@ -18,10 +18,9 @@ DATASETS = {
|
|
| 18 |
}
|
| 19 |
|
| 20 |
LABEL_COLORS = {
|
| 21 |
-
"System 1":
|
| 22 |
-
"Mixed":
|
| 23 |
-
"System 2":
|
| 24 |
-
"Non-evaluative": "#94a3b8",
|
| 25 |
}
|
| 26 |
|
| 27 |
CONF_COLORS = {
|
|
@@ -69,7 +68,7 @@ def load_all() -> dict:
|
|
| 69 |
|
| 70 |
def fig_label_distribution(data: dict) -> go.Figure:
|
| 71 |
"""Grouped bar: label distribution per conference."""
|
| 72 |
-
labels_order = ["System 1", "Mixed", "System 2"
|
| 73 |
confs = list(data.keys())
|
| 74 |
|
| 75 |
fig = go.Figure()
|
|
@@ -271,19 +270,17 @@ def build_summary(data: dict) -> str:
|
|
| 271 |
|
| 272 |
|
| 273 |
FINDINGS = """
|
| 274 |
-
### Key Findings
|
| 275 |
|
| 276 |
-
|
| 277 |
|
| 278 |
-
|
| 279 |
|
| 280 |
-
|
| 281 |
|
| 282 |
-
|
| 283 |
|
| 284 |
-
|
| 285 |
|
| 286 |
-
|
| 287 |
-
|
| 288 |
-
7. **Representativeness Heuristic is more prevalent at NeurIPS (27%) than ICLR/ICML (~17β21%).** NeurIPS reviewers more often judge papers by surface similarity to known strong work rather than explicit criteria.
|
| 289 |
"""
|
|
|
|
| 18 |
}
|
| 19 |
|
| 20 |
LABEL_COLORS = {
|
| 21 |
+
"System 1": "#ef4444",
|
| 22 |
+
"Mixed": "#f59e0b",
|
| 23 |
+
"System 2": "#22c55e",
|
|
|
|
| 24 |
}
|
| 25 |
|
| 26 |
CONF_COLORS = {
|
|
|
|
| 68 |
|
| 69 |
def fig_label_distribution(data: dict) -> go.Figure:
|
| 70 |
"""Grouped bar: label distribution per conference."""
|
| 71 |
+
labels_order = ["System 1", "Mixed", "System 2"]
|
| 72 |
confs = list(data.keys())
|
| 73 |
|
| 74 |
fig = go.Figure()
|
|
|
|
| 270 |
|
| 271 |
|
| 272 |
FINDINGS = """
|
| 273 |
+
### Key Findings
|
| 274 |
|
| 275 |
+
*100 papers Γ 3 conferences, ~1,150 reviews, rated by claude-sonnet-4-6. Papers sampled by stratified random sampling proportional to acceptance tier (Oral / Spotlight / Poster) within each venue.*
|
| 276 |
|
| 277 |
+
1. **ICML and NeurIPS reviewers show more System 2 tendency (~23β26%) than ICLR (16%).** ICML's structured fields (*Claims and Evidence*, *Theoretical Claims*, *Experimental Designs*) appear to scaffold more explicit, decomposed reasoning.
|
| 278 |
|
| 279 |
+
2. **Despite different formats and communities, the overall analytical depth of peer review is remarkably uniform** (RQS 2.80β2.94 / 5), suggesting a field-wide ceiling rather than venue-specific culture.
|
| 280 |
|
| 281 |
+
3. **Decision tier does not predict review quality.** Oral-paper reviews are not systematically stronger than Poster reviews (differences < 0.2 RQS points). Reviewers do not write more analytically for papers they rate highly.
|
| 282 |
|
| 283 |
+
---
|
| 284 |
|
| 285 |
+
> *We are not against AI review. We are against flawed reasoning behind review.*
|
|
|
|
|
|
|
| 286 |
"""
|