Add 28 benchmark datasets with rich visualization views
Browse filesDatasets (28 total):
- Code Generation: REval, HumanEval+, MBPP+, MBPP, ClassEval, LiveCodeBench,
APPS, CodeContests, BigOBench, BigCodeBench, EffiBench, CodeSearchNet
- Code Reasoning: CRUXEval, HumanEvalPack (6 langs)
- Code Editing: SWE-bench Lite/Verified/Full, DebugBench, CanItEdit,
CodeEditorBench, CodeXGLUE Refinement, CommitBench
- Code Completion: SAFIM, HumanEval-X (5 langs)
- Vulnerability Detection: BigVul, DiverseVul, PrimeVul, Devign
View types:
- Simple view (code + inputs/outputs + tests)
- Before/After view with diff highlighting (DebugBench, CanItEdit, etc.)
- GitHub-style diff view with per-file sections and repo/issue/commit links (SWE-bench, CommitBench)
- Multi-language tabs (HumanEval-X, HumanEvalPack with canonical/buggy toggle)
- Fill-in-the-Middle view with inline hole markers (SAFIM)
- Vulnerability view with CWE badges (BigVul, DiverseVul, PrimeVul, Devign)
- Multi-solution view with complexity badges (BigOBench, CodeContests, APPS)
Architecture:
- Refactored to adapters/ package (code_generation, code_editing, code_reasoning, vulnerability)
- Extracted CSS/JS to static/problem.css and static/problem.js
- Deterministic random sampling (seed=42, cap=1000) for large datasets
- Dataset dropdown shows original size for sampled datasets (e.g. '1000 of 33050')
- Compact stats bar with total count and top 5 source tags
- SWE-bench: GitHub-style per-file diff sections with repository/issue/commit links
- SAFIM: inline answer placement at TODO markers instead of end-of-file
- .gitignore +3 -0
- CLAUDE.md +92 -40
- PROGRESS.md +114 -0
- README.md +50 -9
- adapters/__init__.py +82 -0
- adapters/code_editing.py +403 -0
- dataset_adapters.py → adapters/code_generation.py +568 -200
- adapters/code_reasoning.py +366 -0
- adapters/registration.py +410 -0
- adapters/vulnerability.py +245 -0
- app.py +35 -23
- benchmarks_analysis.csv +38 -0
- static/problem.css +587 -0
- static/problem.js +1313 -0
- templates/base.html +154 -0
- templates/index.html +43 -58
- templates/problem.html +5 -972
|
@@ -69,3 +69,6 @@ dmypy.json
|
|
| 69 |
|
| 70 |
# Ruff
|
| 71 |
.ruff_cache/
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
|
| 70 |
# Ruff
|
| 71 |
.ruff_cache/
|
| 72 |
+
|
| 73 |
+
# AIR
|
| 74 |
+
.air/
|
|
@@ -22,10 +22,14 @@
|
|
| 22 |
- Port: 7860 (default), configurable via PORT env var
|
| 23 |
- Debug mode: controlled by FLASK_DEBUG env var
|
| 24 |
|
| 25 |
-
2. **
|
| 26 |
-
- `DatasetAdapter` base class
|
| 27 |
-
-
|
| 28 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
- Each adapter normalizes dataset-specific formats to common API
|
| 30 |
|
| 31 |
3. **templates/** - Jinja2 HTML templates
|
|
@@ -33,9 +37,13 @@
|
|
| 33 |
- `index.html` - Problem list view with filtering
|
| 34 |
- `problem.html` - Problem detail view with syntax highlighting
|
| 35 |
|
| 36 |
-
4. **
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
- Core: flask, pygments
|
| 38 |
-
- Optional HF: datasets (for
|
| 39 |
- Dev: ruff
|
| 40 |
|
| 41 |
### Data Flow
|
|
@@ -59,7 +67,7 @@ User Request → Flask Route → Dataset Adapter → API Response → Template/J
|
|
| 59 |
|
| 60 |
### Python Files
|
| 61 |
- **app.py**: Main entry point, Flask routes, ground truth logic
|
| 62 |
-
- **
|
| 63 |
- **ground_truth_loader.py**: (parent dir) Loads execution traces for DREval
|
| 64 |
- **dynamics.py**: (parent dir) Contains `Nil` singleton for missing values
|
| 65 |
|
|
@@ -75,25 +83,21 @@ User Request → Flask Route → Dataset Adapter → API Response → Template/J
|
|
| 75 |
|
| 76 |
## Key Functionalities
|
| 77 |
|
| 78 |
-
### 1. Dataset Support
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
|
| 80 |
-
**
|
| 81 |
-
- 328 problems (164 HumanEval + 164 ClassEval)
|
| 82 |
-
- Ground truth execution traces available
|
| 83 |
-
- Tasks: Coverage, Path, State, Output predictions
|
| 84 |
-
- Test inputs with expected outputs
|
| 85 |
|
| 86 |
-
**
|
| 87 |
-
- Input/Output prediction tasks
|
| 88 |
-
- Single function execution reasoning
|
| 89 |
|
| 90 |
-
|
| 91 |
-
- Extended HumanEval with additional tests
|
| 92 |
-
- No execution traces
|
| 93 |
|
| 94 |
-
|
| 95 |
-
- Algorithm complexity analysis
|
| 96 |
-
- Multiple solutions per problem with time/space complexity labels
|
| 97 |
|
| 98 |
### 2. Problem Browsing
|
| 99 |
|
|
@@ -320,32 +324,80 @@ When making changes, verify:
|
|
| 320 |
- **datasets**: HuggingFace datasets (>=2.14.0, optional)
|
| 321 |
- **ruff**: Linting and formatting (>=0.8.0, dev)
|
| 322 |
|
| 323 |
-
### Data Sources
|
| 324 |
-
- **
|
| 325 |
-
- **CRUXEval**: cruxeval-org/cruxeval
|
| 326 |
-
- **HumanEval+**: evalplus/humanevalplus
|
| 327 |
-
- **BigOBench**: facebook/BigOBench
|
| 328 |
-
|
| 329 |
-
|
| 330 |
-
|
| 331 |
-
|
| 332 |
-
-
|
| 333 |
-
-
|
| 334 |
-
-
|
| 335 |
-
-
|
| 336 |
-
-
|
| 337 |
-
-
|
| 338 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 339 |
|
| 340 |
## Related Documentation
|
| 341 |
|
| 342 |
- **README.md**: User-facing documentation, installation instructions
|
|
|
|
|
|
|
| 343 |
- **pyproject.toml**: Package metadata, dependencies, ruff configuration
|
| 344 |
- **Dockerfile**: Container deployment configuration (if present)
|
| 345 |
- **requirements.txt**: Pip-format dependency list
|
| 346 |
|
| 347 |
---
|
| 348 |
|
| 349 |
-
**Last Updated**: 2026-03-
|
| 350 |
-
**Project Status**: Active Development
|
| 351 |
**Primary Maintainer**: Egor Bogomolov
|
|
|
|
| 22 |
- Port: 7860 (default), configurable via PORT env var
|
| 23 |
- Debug mode: controlled by FLASK_DEBUG env var
|
| 24 |
|
| 25 |
+
2. **adapters/** - Dataset adapter system (modular package)
|
| 26 |
+
- `__init__.py` - `DatasetAdapter` base class, `REGISTRY` dict, `_set_helpers()` injection
|
| 27 |
+
- `code_generation.py` - REval, HumanEval+, MBPP+, MBPP, ClassEval, LiveCodeBench, APPS, CodeContests, BigOBench, CodeSearchNet, BigCodeBench, EffiBench
|
| 28 |
+
- `code_editing.py` - SWE-bench Lite/Verified/Full, DebugBench, CanItEdit, CodeEditorBench, CodeXGLUE Refinement, CommitBench
|
| 29 |
+
- `code_reasoning.py` - CRUXEval, SAFIM, HumanEval-X, HumanEvalPack
|
| 30 |
+
- `vulnerability.py` - BigVul, DiverseVul, PrimeVul, Devign
|
| 31 |
+
- `registration.py` - `register_hf_datasets()`, sampling helpers, JSONL loading
|
| 32 |
+
- 28 concrete adapters total
|
| 33 |
- Each adapter normalizes dataset-specific formats to common API
|
| 34 |
|
| 35 |
3. **templates/** - Jinja2 HTML templates
|
|
|
|
| 37 |
- `index.html` - Problem list view with filtering
|
| 38 |
- `problem.html` - Problem detail view with syntax highlighting
|
| 39 |
|
| 40 |
+
4. **static/** - Frontend assets
|
| 41 |
+
- `problem.css` - Problem detail page styles
|
| 42 |
+
- `problem.js` - Problem detail page JavaScript (view rendering, diff, FIM, multi-language)
|
| 43 |
+
|
| 44 |
+
5. **requirements.txt** / **pyproject.toml** - Dependencies
|
| 45 |
- Core: flask, pygments
|
| 46 |
+
- Optional HF: datasets, huggingface_hub (for all 28 benchmark datasets)
|
| 47 |
- Dev: ruff
|
| 48 |
|
| 49 |
### Data Flow
|
|
|
|
| 67 |
|
| 68 |
### Python Files
|
| 69 |
- **app.py**: Main entry point, Flask routes, ground truth logic
|
| 70 |
+
- **adapters/**: Adapter package (see Architecture above)
|
| 71 |
- **ground_truth_loader.py**: (parent dir) Loads execution traces for DREval
|
| 72 |
- **dynamics.py**: (parent dir) Contains `Nil` singleton for missing values
|
| 73 |
|
|
|
|
| 83 |
|
| 84 |
## Key Functionalities
|
| 85 |
|
| 86 |
+
### 1. Dataset Support (28 datasets)
|
| 87 |
+
|
| 88 |
+
**Code Generation**: REval (154), HumanEval+ (164), MBPP+ (378), MBPP (500), ClassEval (100), LiveCodeBench (1000), APPS (1000), CodeContests (165), BigOBench (556), BigCodeBench (1140), EffiBench (1000)
|
| 89 |
+
|
| 90 |
+
**Code Reasoning**: CRUXEval (800), HumanEvalPack (6x164)
|
| 91 |
+
|
| 92 |
+
**Code Editing**: SWE-bench Lite (300), SWE-bench Verified (500), SWE-bench (1000), DebugBench (1000), CanItEdit (105), CodeEditorBench (1000), CodeXGLUE Refinement (1000), CommitBench (1000)
|
| 93 |
|
| 94 |
+
**Code Completion/Translation**: SAFIM (1000), HumanEval-X (5x164), CodeSearchNet (1000)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
|
| 96 |
+
**Vulnerability Detection**: BigVul (1000), DiverseVul (1000), PrimeVul (1000), Devign (1000)
|
|
|
|
|
|
|
| 97 |
|
| 98 |
+
Note: Large datasets are sampled down to 1000 entries (seed=42) for fast browsing.
|
|
|
|
|
|
|
| 99 |
|
| 100 |
+
REval is the primary dataset with ground truth execution traces. All other datasets are loaded from HuggingFace Hub.
|
|
|
|
|
|
|
| 101 |
|
| 102 |
### 2. Problem Browsing
|
| 103 |
|
|
|
|
| 324 |
- **datasets**: HuggingFace datasets (>=2.14.0, optional)
|
| 325 |
- **ruff**: Linting and formatting (>=0.8.0, dev)
|
| 326 |
|
| 327 |
+
### Data Sources (all HuggingFace Hub)
|
| 328 |
+
- **REval**: JetBrains-Research/REval
|
| 329 |
+
- **CRUXEval**: cruxeval-org/cruxeval
|
| 330 |
+
- **HumanEval+**: evalplus/humanevalplus
|
| 331 |
+
- **BigOBench**: facebook/BigOBench
|
| 332 |
+
- **MBPP+**: evalplus/mbppplus
|
| 333 |
+
- **ClassEval**: FudanSELab/ClassEval
|
| 334 |
+
- **LiveCodeBench**: livecodebench/code_generation_lite (via `_load_jsonl_dataset`)
|
| 335 |
+
- **DebugBench**: Rtian/DebugBench
|
| 336 |
+
- **HumanEval-X**: THUDM/humaneval-x (via `_load_jsonl_dataset`)
|
| 337 |
+
- **SWE-bench Lite**: princeton-nlp/SWE-bench_Lite
|
| 338 |
+
- **SWE-bench Verified**: princeton-nlp/SWE-bench_Verified
|
| 339 |
+
- **SWE-bench**: princeton-nlp/SWE-bench
|
| 340 |
+
- **CodeContests**: deepmind/code_contests
|
| 341 |
+
- **APPS**: codeparrot/apps (via `refs/convert/parquet` revision)
|
| 342 |
+
- **CanItEdit**: nuprl/CanItEdit
|
| 343 |
+
- **MBPP**: google-research-datasets/mbpp
|
| 344 |
+
- **SAFIM**: gonglinyuan/safim
|
| 345 |
+
- **BigVul**: bstee615/bigvul
|
| 346 |
+
- **DiverseVul**: claudios/DiverseVul
|
| 347 |
+
- **PrimeVul**: starsofchance/PrimeVul (via direct JSONL loading)
|
| 348 |
+
- **CodeEditorBench**: m-a-p/CodeEditorBench (via `_load_jsonl_dataset` per task type)
|
| 349 |
+
- **CodeSearchNet**: code-search-net/code_search_net
|
| 350 |
+
- **Devign**: google/code_x_glue_cc_defect_detection
|
| 351 |
+
- **BigCodeBench**: bigcode/bigcodebench
|
| 352 |
+
- **HumanEvalPack**: bigcode/humanevalpack (per-language configs)
|
| 353 |
+
- **CodeXGLUE Refinement**: google/code_x_glue_cc_code_refinement
|
| 354 |
+
- **CommitBench**: Maxscha/commitbench
|
| 355 |
+
- **EffiBench**: DONG19/EffiBench
|
| 356 |
+
|
| 357 |
+
## Benchmark Expansion
|
| 358 |
+
|
| 359 |
+
### Progress Tracking
|
| 360 |
+
See `PROGRESS.md` for detailed batch plan and status.
|
| 361 |
+
See `benchmarks_analysis.csv` for full analysis of 35+ benchmarks.
|
| 362 |
+
|
| 363 |
+
### Multi-language Syntax Highlighting
|
| 364 |
+
The `highlight_code()` function in `app.py` accepts an optional `language` parameter
|
| 365 |
+
(default: `"python"`). Supported languages are mapped via `LEXER_MAP` to Pygments lexers.
|
| 366 |
+
Adapters pass the language when calling `_highlight_code(code, language=...)`.
|
| 367 |
+
|
| 368 |
+
### View Types
|
| 369 |
+
The problem detail page (`problem.html`) supports several view types, dispatched in `renderProblem()`:
|
| 370 |
+
1. **BigOBench view** — multiple solutions with complexity badges
|
| 371 |
+
2. **Simple view** — code + inputs/outputs + test suite (HumanEval+, MBPP+, MBPP, ClassEval, BigCodeBench, EffiBench)
|
| 372 |
+
3. **CRUXEval view** — given/predict task selector
|
| 373 |
+
4. **DREval view** — full interactive view with coverage, arrows, ground truth
|
| 374 |
+
5. **Before/After view** — side-by-side buggy/fixed code (DebugBench, CanItEdit, CodeEditorBench, CodeXGLUE Refinement)
|
| 375 |
+
6. **Multi-language view** — same problem in multiple languages (HumanEval-X, HumanEvalPack with canonical/buggy toggle)
|
| 376 |
+
7. **Diff view** — patch visualization (SWE-bench Lite, SWE-bench Verified, SWE-bench, CommitBench)
|
| 377 |
+
8. **Fill-in-the-Middle view** — prefix + [HOLE] + suffix (SAFIM)
|
| 378 |
+
9. **Vulnerability view** — vulnerable/patched code + CWE labels (BigVul, DiverseVul, PrimeVul, Devign)
|
| 379 |
+
|
| 380 |
+
### Adding New Datasets (Updated)
|
| 381 |
+
1. Create adapter class in the appropriate `adapters/` submodule inheriting from `DatasetAdapter`
|
| 382 |
+
2. Implement: `problem_count()`, `get_problem_summary()`, `get_problem_detail()`
|
| 383 |
+
3. Set class attributes: `slug`, `display_name`, `has_ground_truth`, `has_tasks`
|
| 384 |
+
4. Import adapter in `adapters/registration.py` and add registration in `register_hf_datasets()` with try/except
|
| 385 |
+
5. If new language: ensure `LEXER_MAP` in `app.py` has the needed lexer
|
| 386 |
+
6. If new view type: add rendering branch in `static/problem.js` `renderProblem()`
|
| 387 |
+
7. Add badge color in `base.html` CSS
|
| 388 |
+
8. Test: `/api/<slug>/problems` and `/api/<slug>/problem/<idx>`
|
| 389 |
|
| 390 |
## Related Documentation
|
| 391 |
|
| 392 |
- **README.md**: User-facing documentation, installation instructions
|
| 393 |
+
- **PROGRESS.md**: Batch integration progress and architecture decisions
|
| 394 |
+
- **benchmarks_analysis.csv**: Full benchmark analysis with prioritization
|
| 395 |
- **pyproject.toml**: Package metadata, dependencies, ruff configuration
|
| 396 |
- **Dockerfile**: Container deployment configuration (if present)
|
| 397 |
- **requirements.txt**: Pip-format dependency list
|
| 398 |
|
| 399 |
---
|
| 400 |
|
| 401 |
+
**Last Updated**: 2026-03-04
|
| 402 |
+
**Project Status**: Active Development — Benchmark Expansion Phase
|
| 403 |
**Primary Maintainer**: Egor Bogomolov
|
|
@@ -0,0 +1,114 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Benchmark Integration Progress
|
| 2 |
+
|
| 3 |
+
## Status: Batches 1-5 Complete
|
| 4 |
+
|
| 5 |
+
## Batch Plan
|
| 6 |
+
|
| 7 |
+
### Batch 1 (Highest Priority -- Easy HF, High Influence)
|
| 8 |
+
| Benchmark | Slug | Status | HF Dataset | View Type |
|
| 9 |
+
|-----------|------|--------|------------|-----------|
|
| 10 |
+
| MBPP+ | `mbppplus` | Done | `evalplus/mbppplus` | Simple |
|
| 11 |
+
| ClassEval | `classeval` | Done | `FudanSELab/ClassEval` | Simple |
|
| 12 |
+
| LiveCodeBench | `livecodebench` | Done | `livecodebench/code_generation_lite` | Simple |
|
| 13 |
+
| DebugBench | `debugbench` | Done | `Rtian/DebugBench` | Before/After |
|
| 14 |
+
| HumanEval-X | `humanevalx` | Done | `THUDM/humaneval-x` | Multi-language |
|
| 15 |
+
|
| 16 |
+
**Refactoring done:** Multi-language syntax highlighting via `get_lexer_by_name()`. Before/after code diff view. Multi-language tab view.
|
| 17 |
+
|
| 18 |
+
### Batch 2
|
| 19 |
+
| Benchmark | Slug | Status | HF Dataset | View Type |
|
| 20 |
+
|-----------|------|--------|------------|-----------|
|
| 21 |
+
| SWE-bench Lite | `swebenchlite` | Done | `princeton-nlp/SWE-bench_Lite` | Diff |
|
| 22 |
+
| CodeContests | `codecontests` | Done | `deepmind/code_contests` | Multi-solution |
|
| 23 |
+
| APPS | `apps` | Done | `codeparrot/apps` | Multi-solution / Simple |
|
| 24 |
+
| CanItEdit | `canitedit` | Done | `nuprl/CanItEdit` | Before/After |
|
| 25 |
+
| MBPP | `mbpp` | Done | `google-research-datasets/mbpp` | Simple |
|
| 26 |
+
|
| 27 |
+
**New views:** Unified diff view for SWE-bench patches. Multi-solution view extended to show language labels for CodeContests.
|
| 28 |
+
|
| 29 |
+
### Batch 3
|
| 30 |
+
| Benchmark | Slug | Status | HF Dataset | View Type |
|
| 31 |
+
|-----------|------|--------|------------|-----------|
|
| 32 |
+
| SAFIM | `safim` | Done | `gonglinyuan/safim` | Fill-in-the-Middle |
|
| 33 |
+
| BigVul | `bigvul` | Done | `bstee615/bigvul` | Vulnerability |
|
| 34 |
+
| DiverseVul | `diversevul` | Done | `claudios/DiverseVul` | Vulnerability |
|
| 35 |
+
| PrimeVul | `primevul` | Done | `starsofchance/PrimeVul` | Vulnerability |
|
| 36 |
+
| CodeEditorBench | `codeeditorbench` | Done | `m-a-p/CodeEditorBench` | Before/After |
|
| 37 |
+
|
| 38 |
+
**New views:** Fill-in-the-Middle view showing code with [HOLE] marker and ground truth. Vulnerability view with CWE badges and vulnerable/patched code comparison.
|
| 39 |
+
|
| 40 |
+
### Batch 4
|
| 41 |
+
| Benchmark | Slug | Status | HF Dataset | View Type |
|
| 42 |
+
|-----------|------|--------|------------|-----------|
|
| 43 |
+
| SWE-bench Verified | `swebenchverified` | Done | `princeton-nlp/SWE-bench_Verified` | Diff |
|
| 44 |
+
| CodeSearchNet | `codesearchnet` | Done | `code-search-net/code_search_net` | Simple |
|
| 45 |
+
| Devign | `devign` | Done | `google/code_x_glue_cc_defect_detection` | Vulnerability |
|
| 46 |
+
|
| 47 |
+
### Dropped from original plan
|
| 48 |
+
| Benchmark | Reason |
|
| 49 |
+
|-----------|--------|
|
| 50 |
+
| DS-1000 | Complex library-specific format, limited visualization value |
|
| 51 |
+
| RepoBench | Repo-level context too complex for per-problem viewing |
|
| 52 |
+
| MultiPL-E | 22 languages but same problems as HumanEval/MBPP already covered |
|
| 53 |
+
| McEval | Very large (40 languages), complex format |
|
| 54 |
+
| xCodeEval | Very large (25M rows), 7 tasks, too complex |
|
| 55 |
+
| CrossVul | Similar to DiverseVul/BigVul, diminishing returns |
|
| 56 |
+
|
| 57 |
+
### Batch 5
|
| 58 |
+
| Benchmark | Slug | Status | HF Dataset | View Type |
|
| 59 |
+
|-----------|------|--------|------------|-----------|
|
| 60 |
+
| BigCodeBench | `bigcodebench` | Done | `bigcode/bigcodebench` | Simple |
|
| 61 |
+
| HumanEvalPack | `humanevalpack` | Done | `bigcode/humanevalpack` | Multi-language + Before/After |
|
| 62 |
+
| CodeXGLUE Refinement | `codexgluerefinement` | Done | `google/code_x_glue_cc_code_refinement` | Before/After |
|
| 63 |
+
| SWE-bench | `swebenchfull` | Done | `princeton-nlp/SWE-bench` | Diff |
|
| 64 |
+
| CommitBench | `commitbench` | Done | `Maxscha/commitbench` | Diff |
|
| 65 |
+
| EffiBench | `effibench` | Done | `DONG19/EffiBench` | Simple |
|
| 66 |
+
|
| 67 |
+
**New views:** Multi-language view with canonical/buggy code toggle (HumanEvalPack). CommitBench reuses diff view. CodeXGLUE Refinement uses before/after Java view.
|
| 68 |
+
|
| 69 |
+
### Deferred (GitHub-only or complex infrastructure)
|
| 70 |
+
CoderEval, NaturalCodeBench, DevEval, RunBugRun, Defects4J, ConDefects, FixEval, TransCoder, AVATAR, TypeEvalPy, VJBench, SVEN, PyTER
|
| 71 |
+
|
| 72 |
+
## Architecture Decisions
|
| 73 |
+
|
| 74 |
+
### Multi-language Support
|
| 75 |
+
- `highlight_code()` in `app.py` accepts `language` parameter (default: `"python"`)
|
| 76 |
+
- Uses `get_lexer_by_name()` from Pygments for automatic lexer selection
|
| 77 |
+
- Adapters pass language when calling `_highlight_code(code, language=...)`
|
| 78 |
+
|
| 79 |
+
### View Types Implemented
|
| 80 |
+
1. **BigOBench view** -- multiple solutions with complexity badges
|
| 81 |
+
2. **Simple view** -- code + inputs/outputs + test suite (HumanEval+, MBPP+, MBPP, ClassEval, LiveCodeBench, APPS, CodeSearchNet)
|
| 82 |
+
3. **CRUXEval view** -- given/predict task selector
|
| 83 |
+
4. **DREval view** -- full interactive view with coverage, arrows, ground truth
|
| 84 |
+
5. **Before/After view** -- side-by-side buggy/fixed code (DebugBench, CanItEdit, CodeEditorBench)
|
| 85 |
+
6. **Multi-language view** -- same problem in multiple languages (HumanEval-X, HumanEvalPack)
|
| 86 |
+
7. **Diff view** -- unified diff patch visualization (SWE-bench Lite, SWE-bench Verified, SWE-bench, CommitBench)
|
| 87 |
+
8. **Fill-in-the-Middle view** -- prefix + [HOLE] + suffix (SAFIM)
|
| 88 |
+
9. **Vulnerability view** -- vulnerable/patched code + CWE labels (BigVul, DiverseVul, PrimeVul, Devign)
|
| 89 |
+
|
| 90 |
+
## Total Datasets: 28
|
| 91 |
+
Base (4): REval, CRUXEval, HumanEval+, BigOBench
|
| 92 |
+
Batch 1 (5): MBPP+, ClassEval, LiveCodeBench, DebugBench, HumanEval-X
|
| 93 |
+
Batch 2 (5): SWE-bench Lite, CodeContests, APPS, CanItEdit, MBPP
|
| 94 |
+
Batch 3 (5): SAFIM, BigVul, DiverseVul, PrimeVul, CodeEditorBench
|
| 95 |
+
Batch 4 (3): SWE-bench Verified, CodeSearchNet, Devign
|
| 96 |
+
Batch 5 (6): BigCodeBench, HumanEvalPack, CodeXGLUE Refinement, SWE-bench, CommitBench, EffiBench
|
| 97 |
+
|
| 98 |
+
## Changelog
|
| 99 |
+
|
| 100 |
+
- 2026-03-03: Initial benchmark analysis and prioritization complete
|
| 101 |
+
- 2026-03-03: Batch 1 complete (MBPP+, ClassEval, LiveCodeBench, DebugBench, HumanEval-X)
|
| 102 |
+
- 2026-03-03: Batch 2 complete (SWE-bench Lite, CodeContests, APPS, CanItEdit, MBPP)
|
| 103 |
+
- 2026-03-03: Batch 3 complete (SAFIM, BigVul, DiverseVul, PrimeVul, CodeEditorBench)
|
| 104 |
+
- 2026-03-03: Batch 4 complete (SWE-bench Verified, CodeSearchNet, Devign)
|
| 105 |
+
- 2026-03-03: Fixed APPS loading (refs/convert/parquet), PrimeVul (direct JSONL), CodeEditorBench (per-task JSONL)
|
| 106 |
+
- 2026-03-03: All 22 datasets verified loading successfully
|
| 107 |
+
- 2026-03-04: Refactored adapters into submodules (adapters/code_generation.py, code_editing.py, code_reasoning.py, vulnerability.py)
|
| 108 |
+
- 2026-03-04: Extracted CSS and JS into static/ directory (static/problem.css, static/problem.js)
|
| 109 |
+
- 2026-03-04: Added sampling for large datasets (cap at 1000 with seed=42)
|
| 110 |
+
- 2026-03-04: Enhanced FIM view (merged code with ground truth highlighting)
|
| 111 |
+
- 2026-03-04: Enhanced Before/After view (diff highlighting)
|
| 112 |
+
- 2026-03-04: Enhanced SWE-bench diff view (full file with diff chunks)
|
| 113 |
+
- 2026-03-04: Batch 5 complete (BigCodeBench, HumanEvalPack, CodeXGLUE Refinement, SWE-bench, CommitBench, EffiBench)
|
| 114 |
+
- 2026-03-04: All 28 datasets verified loading successfully
|
|
@@ -11,14 +11,55 @@ pinned: false
|
|
| 11 |
|
| 12 |
A web-based interface for browsing and manually inspecting individual datapoints from popular ML4SE (Machine Learning for Software Engineering) benchmark datasets.
|
| 13 |
|
| 14 |
-
## Supported Datasets
|
| 15 |
|
| 16 |
-
|
| 17 |
-
|
|
| 18 |
-
|
|
| 19 |
-
| **
|
| 20 |
-
| **
|
| 21 |
-
| **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
## Installation & Usage
|
| 24 |
|
|
@@ -45,7 +86,7 @@ uv run ruff format .
|
|
| 45 |
|
| 46 |
### Adding a New Dataset
|
| 47 |
|
| 48 |
-
1. Create an adapter class in `
|
| 49 |
2. Implement required methods: `problem_count()`, `get_problem_summary()`, `get_problem_detail()`
|
| 50 |
-
3. Register the adapter in
|
| 51 |
4. Test: `/api/<slug>/problems` and `/api/<slug>/problem/<idx>`
|
|
|
|
| 11 |
|
| 12 |
A web-based interface for browsing and manually inspecting individual datapoints from popular ML4SE (Machine Learning for Software Engineering) benchmark datasets.
|
| 13 |
|
| 14 |
+
## Supported Datasets (28)
|
| 15 |
|
| 16 |
+
### Code Generation
|
| 17 |
+
| Dataset | Source | View Type |
|
| 18 |
+
|---------|--------|-----------|
|
| 19 |
+
| **HumanEval+** | evalplus/humanevalplus | Simple |
|
| 20 |
+
| **MBPP+** | evalplus/mbppplus | Simple |
|
| 21 |
+
| **MBPP** | google-research-datasets/mbpp | Simple |
|
| 22 |
+
| **ClassEval** | FudanSELab/ClassEval | Simple |
|
| 23 |
+
| **LiveCodeBench** | livecodebench/code_generation_lite | Simple |
|
| 24 |
+
| **APPS** | codeparrot/apps | Multi-solution |
|
| 25 |
+
| **CodeContests** | deepmind/code_contests | Multi-solution |
|
| 26 |
+
| **BigOBench** | facebook/BigOBench | Complexity badges |
|
| 27 |
+
| **BigCodeBench** | bigcode/bigcodebench | Simple |
|
| 28 |
+
| **EffiBench** | DONG19/EffiBench | Simple |
|
| 29 |
+
|
| 30 |
+
### Code Reasoning & Evaluation
|
| 31 |
+
| Dataset | Source | View Type |
|
| 32 |
+
|---------|--------|-----------|
|
| 33 |
+
| **REval** | JetBrains-Research/REval | Interactive (coverage, arrows, ground truth) |
|
| 34 |
+
| **CRUXEval** | cruxeval-org/cruxeval | Given/Predict task selector |
|
| 35 |
+
| **HumanEvalPack** | bigcode/humanevalpack | Multi-language + buggy/canonical |
|
| 36 |
+
|
| 37 |
+
### Code Editing & Debugging
|
| 38 |
+
| Dataset | Source | View Type |
|
| 39 |
+
|---------|--------|-----------|
|
| 40 |
+
| **SWE-bench Lite** | princeton-nlp/SWE-bench_Lite | Unified diff |
|
| 41 |
+
| **SWE-bench Verified** | princeton-nlp/SWE-bench_Verified | Unified diff |
|
| 42 |
+
| **SWE-bench** | princeton-nlp/SWE-bench | Unified diff |
|
| 43 |
+
| **DebugBench** | Rtian/DebugBench | Before/After |
|
| 44 |
+
| **CanItEdit** | nuprl/CanItEdit | Before/After |
|
| 45 |
+
| **CodeEditorBench** | m-a-p/CodeEditorBench | Before/After |
|
| 46 |
+
| **CodeXGLUE Refinement** | google/code_x_glue_cc_code_refinement | Before/After |
|
| 47 |
+
| **CommitBench** | Maxscha/commitbench | Unified diff |
|
| 48 |
+
|
| 49 |
+
### Code Completion & Translation
|
| 50 |
+
| Dataset | Source | View Type |
|
| 51 |
+
|---------|--------|-----------|
|
| 52 |
+
| **SAFIM** | gonglinyuan/safim | Fill-in-the-Middle |
|
| 53 |
+
| **HumanEval-X** | THUDM/humaneval-x | Multi-language tabs |
|
| 54 |
+
| **CodeSearchNet** | code-search-net/code_search_net | Simple |
|
| 55 |
+
|
| 56 |
+
### Vulnerability Detection
|
| 57 |
+
| Dataset | Source | View Type |
|
| 58 |
+
|---------|--------|-----------|
|
| 59 |
+
| **BigVul** | bstee615/bigvul | Vulnerability (CWE badges) |
|
| 60 |
+
| **DiverseVul** | claudios/DiverseVul | Vulnerability |
|
| 61 |
+
| **PrimeVul** | starsofchance/PrimeVul | Vulnerability |
|
| 62 |
+
| **Devign** | google/code_x_glue_cc_defect_detection | Vulnerability |
|
| 63 |
|
| 64 |
## Installation & Usage
|
| 65 |
|
|
|
|
| 86 |
|
| 87 |
### Adding a New Dataset
|
| 88 |
|
| 89 |
+
1. Create an adapter class in the appropriate `adapters/` submodule inheriting from `DatasetAdapter`
|
| 90 |
2. Implement required methods: `problem_count()`, `get_problem_summary()`, `get_problem_detail()`
|
| 91 |
+
3. Register the adapter in `adapters/registration.py`
|
| 92 |
4. Test: `/api/<slug>/problems` and `/api/<slug>/problem/<idx>`
|
|
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Dataset adapters for the ML4SE Benchmark Viewer.
|
| 3 |
+
|
| 4 |
+
Each adapter normalises a different benchmark dataset into a common API shape
|
| 5 |
+
so the Flask routes and templates can handle them uniformly.
|
| 6 |
+
|
| 7 |
+
The REGISTRY dict maps slug strings (used in URLs) to adapter instances.
|
| 8 |
+
"""
|
| 9 |
+
|
| 10 |
+
from __future__ import annotations
|
| 11 |
+
|
| 12 |
+
from typing import Any
|
| 13 |
+
|
| 14 |
+
# ---------------------------------------------------------------------------
|
| 15 |
+
# Helper function stubs – injected at runtime by app.py via _set_helpers()
|
| 16 |
+
# ---------------------------------------------------------------------------
|
| 17 |
+
|
| 18 |
+
_highlight_code = None
|
| 19 |
+
_code_offset = None
|
| 20 |
+
_extract_test_classes = None
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
def _set_helpers(highlight_code_fn, code_offset_fn, extract_test_classes_fn):
|
| 24 |
+
"""Called once by app.py to inject helper functions."""
|
| 25 |
+
global _highlight_code, _code_offset, _extract_test_classes
|
| 26 |
+
_highlight_code = highlight_code_fn
|
| 27 |
+
_code_offset = code_offset_fn
|
| 28 |
+
_extract_test_classes = extract_test_classes_fn
|
| 29 |
+
|
| 30 |
+
# Propagate to submodules so adapters can use them
|
| 31 |
+
from adapters import code_editing, code_generation, code_reasoning, vulnerability
|
| 32 |
+
|
| 33 |
+
for mod in (code_generation, code_editing, code_reasoning, vulnerability):
|
| 34 |
+
mod._highlight_code = highlight_code_fn
|
| 35 |
+
mod._code_offset = code_offset_fn
|
| 36 |
+
mod._extract_test_classes = extract_test_classes_fn
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
# ---------------------------------------------------------------------------
|
| 40 |
+
# Registry
|
| 41 |
+
# ---------------------------------------------------------------------------
|
| 42 |
+
|
| 43 |
+
REGISTRY: dict[str, DatasetAdapter] = {}
|
| 44 |
+
|
| 45 |
+
|
| 46 |
+
# ---------------------------------------------------------------------------
|
| 47 |
+
# Base class
|
| 48 |
+
# ---------------------------------------------------------------------------
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
class DatasetAdapter:
|
| 52 |
+
slug: str = ""
|
| 53 |
+
display_name: str = ""
|
| 54 |
+
has_ground_truth: bool = False
|
| 55 |
+
has_tasks: bool = False
|
| 56 |
+
total_count: int | None = None # original size before sampling (None = not sampled)
|
| 57 |
+
|
| 58 |
+
def problem_count(self) -> int:
|
| 59 |
+
raise NotImplementedError
|
| 60 |
+
|
| 61 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 62 |
+
raise NotImplementedError
|
| 63 |
+
|
| 64 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 65 |
+
raise NotImplementedError
|
| 66 |
+
|
| 67 |
+
def get_ground_truth(self, idx: int, input_idx: int) -> dict[str, Any]:
|
| 68 |
+
return {"status": "unavailable", "message": "Ground truth not available for this dataset"}
|
| 69 |
+
|
| 70 |
+
|
| 71 |
+
# ---------------------------------------------------------------------------
|
| 72 |
+
# Re-export registration entry point
|
| 73 |
+
# ---------------------------------------------------------------------------
|
| 74 |
+
|
| 75 |
+
from adapters.registration import register_hf_datasets # noqa: E402, F401
|
| 76 |
+
|
| 77 |
+
__all__ = [
|
| 78 |
+
"REGISTRY",
|
| 79 |
+
"DatasetAdapter",
|
| 80 |
+
"_set_helpers",
|
| 81 |
+
"register_hf_datasets",
|
| 82 |
+
]
|
|
@@ -0,0 +1,403 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Code editing benchmark adapters (SWE-bench, DebugBench, CanItEdit, CodeEditorBench)."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
import json
|
| 6 |
+
from typing import Any
|
| 7 |
+
|
| 8 |
+
from adapters import DatasetAdapter
|
| 9 |
+
|
| 10 |
+
# Injected at runtime by _set_helpers()
|
| 11 |
+
_highlight_code = None
|
| 12 |
+
_code_offset = None
|
| 13 |
+
_extract_test_classes = None
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
# ---------------------------------------------------------------------------
|
| 17 |
+
# SWE-bench Lite adapter (HuggingFace: princeton-nlp/SWE-bench_Lite)
|
| 18 |
+
# ---------------------------------------------------------------------------
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
class SWEBenchLiteAdapter(DatasetAdapter):
|
| 22 |
+
slug = "swebenchlite"
|
| 23 |
+
display_name = "SWE-bench Lite"
|
| 24 |
+
has_ground_truth = False
|
| 25 |
+
has_tasks = False
|
| 26 |
+
|
| 27 |
+
def __init__(self, hf_dataset):
|
| 28 |
+
self._ds = hf_dataset
|
| 29 |
+
|
| 30 |
+
def problem_count(self) -> int:
|
| 31 |
+
return len(self._ds)
|
| 32 |
+
|
| 33 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 34 |
+
row = self._ds[idx]
|
| 35 |
+
return {
|
| 36 |
+
"idx": idx,
|
| 37 |
+
"task_id": row["instance_id"],
|
| 38 |
+
"entry_point": row["instance_id"].split("__")[-1],
|
| 39 |
+
"num_inputs": 0,
|
| 40 |
+
"source": row["repo"],
|
| 41 |
+
}
|
| 42 |
+
|
| 43 |
+
@staticmethod
|
| 44 |
+
def _github_links(instance_id: str, repo: str, base_commit: str) -> dict[str, str]:
|
| 45 |
+
"""Build GitHub URLs from SWE-bench instance metadata."""
|
| 46 |
+
links: dict[str, str] = {}
|
| 47 |
+
if repo:
|
| 48 |
+
links["repo_url"] = f"https://github.com/{repo}"
|
| 49 |
+
# instance_id format: "repo__issue-number" e.g. "astropy__astropy-12907"
|
| 50 |
+
parts = instance_id.rsplit("-", 1)
|
| 51 |
+
if len(parts) == 2 and parts[1].isdigit() and repo:
|
| 52 |
+
links["issue_url"] = f"https://github.com/{repo}/issues/{parts[1]}"
|
| 53 |
+
if base_commit and repo:
|
| 54 |
+
links["commit_url"] = f"https://github.com/{repo}/commit/{base_commit}"
|
| 55 |
+
return links
|
| 56 |
+
|
| 57 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 58 |
+
row = self._ds[idx]
|
| 59 |
+
patch = row["patch"]
|
| 60 |
+
fail_to_pass = json.loads(row["FAIL_TO_PASS"]) if row["FAIL_TO_PASS"] else []
|
| 61 |
+
pass_to_pass = json.loads(row["PASS_TO_PASS"]) if row["PASS_TO_PASS"] else []
|
| 62 |
+
instance_id = row["instance_id"]
|
| 63 |
+
repo = row["repo"]
|
| 64 |
+
base_commit = row.get("base_commit", "")
|
| 65 |
+
return {
|
| 66 |
+
"idx": idx,
|
| 67 |
+
"task_id": instance_id,
|
| 68 |
+
"entry_point": instance_id.split("__")[-1],
|
| 69 |
+
"code": patch,
|
| 70 |
+
"highlighted_code": "",
|
| 71 |
+
"inputs": [],
|
| 72 |
+
"outputs": [],
|
| 73 |
+
"test": None,
|
| 74 |
+
"tasks": [],
|
| 75 |
+
"source": repo,
|
| 76 |
+
"has_ground_truth": False,
|
| 77 |
+
"has_tasks": False,
|
| 78 |
+
"description": row["problem_statement"],
|
| 79 |
+
"patch": patch,
|
| 80 |
+
"test_patch": row.get("test_patch", ""),
|
| 81 |
+
"fail_to_pass": fail_to_pass,
|
| 82 |
+
"pass_to_pass": pass_to_pass,
|
| 83 |
+
"hints": row.get("hints_text", ""),
|
| 84 |
+
"repo": repo,
|
| 85 |
+
"base_commit": base_commit,
|
| 86 |
+
"version": row.get("version", ""),
|
| 87 |
+
"created_at": row.get("created_at", ""),
|
| 88 |
+
**self._github_links(instance_id, repo, base_commit),
|
| 89 |
+
}
|
| 90 |
+
|
| 91 |
+
|
| 92 |
+
# ---------------------------------------------------------------------------
|
| 93 |
+
# SWE-bench Verified adapter (HuggingFace: princeton-nlp/SWE-bench_Verified)
|
| 94 |
+
# ---------------------------------------------------------------------------
|
| 95 |
+
|
| 96 |
+
|
| 97 |
+
class SWEBenchVerifiedAdapter(SWEBenchLiteAdapter):
|
| 98 |
+
slug = "swebenchverified"
|
| 99 |
+
display_name = "SWE-bench Verified"
|
| 100 |
+
|
| 101 |
+
|
| 102 |
+
class SWEBenchFullAdapter(SWEBenchLiteAdapter):
|
| 103 |
+
slug = "swebenchfull"
|
| 104 |
+
display_name = "SWE-bench"
|
| 105 |
+
|
| 106 |
+
|
| 107 |
+
# ---------------------------------------------------------------------------
|
| 108 |
+
# DebugBench adapter (HuggingFace: Rtian/DebugBench)
|
| 109 |
+
# ---------------------------------------------------------------------------
|
| 110 |
+
|
| 111 |
+
|
| 112 |
+
class DebugBenchAdapter(DatasetAdapter):
|
| 113 |
+
slug = "debugbench"
|
| 114 |
+
display_name = "DebugBench"
|
| 115 |
+
has_ground_truth = False
|
| 116 |
+
has_tasks = False
|
| 117 |
+
|
| 118 |
+
def __init__(self, hf_dataset):
|
| 119 |
+
self._ds = hf_dataset
|
| 120 |
+
|
| 121 |
+
def problem_count(self) -> int:
|
| 122 |
+
return len(self._ds)
|
| 123 |
+
|
| 124 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 125 |
+
row = self._ds[idx]
|
| 126 |
+
return {
|
| 127 |
+
"idx": idx,
|
| 128 |
+
"task_id": row["slug"],
|
| 129 |
+
"entry_point": row["slug"],
|
| 130 |
+
"num_inputs": len(row["examples"]),
|
| 131 |
+
"source": f"{row['language']}/{row['category']}",
|
| 132 |
+
}
|
| 133 |
+
|
| 134 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 135 |
+
row = self._ds[idx]
|
| 136 |
+
lang = row["language"]
|
| 137 |
+
buggy = row["buggy_code"]
|
| 138 |
+
fixed = row["solution"]
|
| 139 |
+
return {
|
| 140 |
+
"idx": idx,
|
| 141 |
+
"task_id": row["slug"],
|
| 142 |
+
"entry_point": row["slug"],
|
| 143 |
+
"code": fixed,
|
| 144 |
+
"highlighted_code": _highlight_code(fixed, language=lang),
|
| 145 |
+
"inputs": [],
|
| 146 |
+
"outputs": [],
|
| 147 |
+
"test": None,
|
| 148 |
+
"tasks": [],
|
| 149 |
+
"source": f"{lang}/{row['category']}",
|
| 150 |
+
"has_ground_truth": False,
|
| 151 |
+
"has_tasks": False,
|
| 152 |
+
"description": row["question"],
|
| 153 |
+
"language": lang,
|
| 154 |
+
"buggy_code": buggy,
|
| 155 |
+
"buggy_highlighted_code": _highlight_code(buggy, language=lang),
|
| 156 |
+
"fixed_code": fixed,
|
| 157 |
+
"fixed_highlighted_code": _highlight_code(fixed, language=lang),
|
| 158 |
+
"bug_category": row["category"],
|
| 159 |
+
"bug_subtype": row["subtype"],
|
| 160 |
+
"bug_explanation": row["bug_explanation"],
|
| 161 |
+
"difficulty": row["level"],
|
| 162 |
+
"examples": list(row["examples"]),
|
| 163 |
+
}
|
| 164 |
+
|
| 165 |
+
|
| 166 |
+
# ---------------------------------------------------------------------------
|
| 167 |
+
# CanItEdit adapter (HuggingFace: nuprl/CanItEdit)
|
| 168 |
+
# ---------------------------------------------------------------------------
|
| 169 |
+
|
| 170 |
+
|
| 171 |
+
class CanItEditAdapter(DatasetAdapter):
|
| 172 |
+
slug = "canitedit"
|
| 173 |
+
display_name = "CanItEdit"
|
| 174 |
+
has_ground_truth = False
|
| 175 |
+
has_tasks = False
|
| 176 |
+
|
| 177 |
+
def __init__(self, hf_dataset):
|
| 178 |
+
self._ds = hf_dataset
|
| 179 |
+
|
| 180 |
+
def problem_count(self) -> int:
|
| 181 |
+
return len(self._ds)
|
| 182 |
+
|
| 183 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 184 |
+
row = self._ds[idx]
|
| 185 |
+
taxonomy = row.get("taxonomy", {})
|
| 186 |
+
change_kind = taxonomy.get("change_kind", "") if isinstance(taxonomy, dict) else ""
|
| 187 |
+
return {
|
| 188 |
+
"idx": idx,
|
| 189 |
+
"task_id": row.get("full_name", str(row.get("id", idx))),
|
| 190 |
+
"entry_point": row.get("name", f"edit_{idx}"),
|
| 191 |
+
"num_inputs": 0,
|
| 192 |
+
"source": change_kind or "CanItEdit",
|
| 193 |
+
}
|
| 194 |
+
|
| 195 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 196 |
+
row = self._ds[idx]
|
| 197 |
+
before = row["before"]
|
| 198 |
+
after = row["after"]
|
| 199 |
+
taxonomy = row.get("taxonomy", {})
|
| 200 |
+
if not isinstance(taxonomy, dict):
|
| 201 |
+
taxonomy = {}
|
| 202 |
+
return {
|
| 203 |
+
"idx": idx,
|
| 204 |
+
"task_id": row.get("full_name", str(row.get("id", idx))),
|
| 205 |
+
"entry_point": row.get("name", f"edit_{idx}"),
|
| 206 |
+
"code": after,
|
| 207 |
+
"highlighted_code": _highlight_code(after),
|
| 208 |
+
"inputs": [],
|
| 209 |
+
"outputs": [],
|
| 210 |
+
"test": row.get("tests", ""),
|
| 211 |
+
"tasks": [],
|
| 212 |
+
"source": taxonomy.get("change_kind", "CanItEdit"),
|
| 213 |
+
"has_ground_truth": False,
|
| 214 |
+
"has_tasks": False,
|
| 215 |
+
"description": row.get("instruction_descriptive", ""),
|
| 216 |
+
"buggy_code": before,
|
| 217 |
+
"buggy_highlighted_code": _highlight_code(before),
|
| 218 |
+
"fixed_code": after,
|
| 219 |
+
"fixed_highlighted_code": _highlight_code(after),
|
| 220 |
+
"bug_category": taxonomy.get("change_kind", ""),
|
| 221 |
+
"bug_subtype": taxonomy.get("topic", ""),
|
| 222 |
+
"bug_explanation": row.get("instruction_lazy", ""),
|
| 223 |
+
}
|
| 224 |
+
|
| 225 |
+
|
| 226 |
+
# ---------------------------------------------------------------------------
|
| 227 |
+
# CodeEditorBench adapter (HuggingFace: m-a-p/CodeEditorBench)
|
| 228 |
+
# ---------------------------------------------------------------------------
|
| 229 |
+
|
| 230 |
+
|
| 231 |
+
class CodeEditorBenchAdapter(DatasetAdapter):
|
| 232 |
+
slug = "codeeditorbench"
|
| 233 |
+
display_name = "CodeEditorBench"
|
| 234 |
+
has_ground_truth = False
|
| 235 |
+
has_tasks = False
|
| 236 |
+
|
| 237 |
+
def __init__(self, rows: list[dict[str, Any]]):
|
| 238 |
+
self._rows = rows
|
| 239 |
+
|
| 240 |
+
def problem_count(self) -> int:
|
| 241 |
+
return len(self._rows)
|
| 242 |
+
|
| 243 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 244 |
+
row = self._rows[idx]
|
| 245 |
+
return {
|
| 246 |
+
"idx": idx,
|
| 247 |
+
"task_id": str(row.get("idx", idx)),
|
| 248 |
+
"entry_point": row.get("title", f"problem_{idx}"),
|
| 249 |
+
"num_inputs": 0,
|
| 250 |
+
"source": row.get("_task_type", "unknown"),
|
| 251 |
+
}
|
| 252 |
+
|
| 253 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 254 |
+
row = self._rows[idx]
|
| 255 |
+
task_type = row.get("_task_type", "unknown")
|
| 256 |
+
lang = row.get("code_language", row.get("source_lang", "python")) or "python"
|
| 257 |
+
lang_key = lang.lower()
|
| 258 |
+
|
| 259 |
+
if task_type == "code_debug":
|
| 260 |
+
buggy = row.get("incorrect_solutions", "")
|
| 261 |
+
fixed = row.get("solutions", "")
|
| 262 |
+
elif task_type == "code_translate":
|
| 263 |
+
buggy = row.get("source_code", "")
|
| 264 |
+
fixed = row.get("solutions", row.get("source_code", ""))
|
| 265 |
+
elif task_type == "code_polishment":
|
| 266 |
+
buggy = row.get("source_code", "")
|
| 267 |
+
fixed = row.get("solutions", row.get("source_code", ""))
|
| 268 |
+
else: # code_switch
|
| 269 |
+
buggy = row.get("similar_source_code", row.get("source_code", ""))
|
| 270 |
+
fixed = row.get("solutions", row.get("source_code", ""))
|
| 271 |
+
|
| 272 |
+
return {
|
| 273 |
+
"idx": idx,
|
| 274 |
+
"task_id": str(row.get("idx", idx)),
|
| 275 |
+
"entry_point": row.get("title", f"problem_{idx}"),
|
| 276 |
+
"code": fixed,
|
| 277 |
+
"highlighted_code": _highlight_code(fixed, language=lang_key) if fixed else "",
|
| 278 |
+
"inputs": [],
|
| 279 |
+
"outputs": [],
|
| 280 |
+
"test": None,
|
| 281 |
+
"tasks": [],
|
| 282 |
+
"source": task_type,
|
| 283 |
+
"has_ground_truth": False,
|
| 284 |
+
"has_tasks": False,
|
| 285 |
+
"description": "",
|
| 286 |
+
"buggy_code": buggy,
|
| 287 |
+
"buggy_highlighted_code": _highlight_code(buggy, language=lang_key) if buggy else "",
|
| 288 |
+
"fixed_code": fixed,
|
| 289 |
+
"fixed_highlighted_code": _highlight_code(fixed, language=lang_key) if fixed else "",
|
| 290 |
+
"bug_category": task_type,
|
| 291 |
+
"bug_subtype": row.get("difficulty", ""),
|
| 292 |
+
"bug_explanation": "",
|
| 293 |
+
"difficulty": row.get("difficulty", ""),
|
| 294 |
+
"language": lang,
|
| 295 |
+
}
|
| 296 |
+
|
| 297 |
+
|
| 298 |
+
# ---------------------------------------------------------------------------
|
| 299 |
+
# CodeXGLUE Code Refinement adapter (HuggingFace: google/code_x_glue_cc_code_refinement)
|
| 300 |
+
# ---------------------------------------------------------------------------
|
| 301 |
+
|
| 302 |
+
|
| 303 |
+
class CodeXGLUERefinementAdapter(DatasetAdapter):
|
| 304 |
+
slug = "codexgluerefinement"
|
| 305 |
+
display_name = "CodeXGLUE Code Refinement"
|
| 306 |
+
has_ground_truth = False
|
| 307 |
+
has_tasks = False
|
| 308 |
+
|
| 309 |
+
def __init__(self, hf_dataset):
|
| 310 |
+
self._ds = hf_dataset
|
| 311 |
+
|
| 312 |
+
def problem_count(self) -> int:
|
| 313 |
+
return len(self._ds)
|
| 314 |
+
|
| 315 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 316 |
+
row = self._ds[idx]
|
| 317 |
+
return {
|
| 318 |
+
"idx": idx,
|
| 319 |
+
"task_id": str(row.get("id", idx)),
|
| 320 |
+
"entry_point": f"refinement_{row.get('id', idx)}",
|
| 321 |
+
"num_inputs": 0,
|
| 322 |
+
"source": "CodeXGLUE",
|
| 323 |
+
}
|
| 324 |
+
|
| 325 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 326 |
+
row = self._ds[idx]
|
| 327 |
+
buggy = row.get("buggy", "")
|
| 328 |
+
fixed = row.get("fixed", "")
|
| 329 |
+
return {
|
| 330 |
+
"idx": idx,
|
| 331 |
+
"task_id": str(row.get("id", idx)),
|
| 332 |
+
"entry_point": f"refinement_{row.get('id', idx)}",
|
| 333 |
+
"code": fixed,
|
| 334 |
+
"highlighted_code": _highlight_code(fixed, language="java") if fixed else "",
|
| 335 |
+
"inputs": [],
|
| 336 |
+
"outputs": [],
|
| 337 |
+
"test": None,
|
| 338 |
+
"tasks": [],
|
| 339 |
+
"source": "CodeXGLUE",
|
| 340 |
+
"has_ground_truth": False,
|
| 341 |
+
"has_tasks": False,
|
| 342 |
+
"description": "",
|
| 343 |
+
"buggy_code": buggy,
|
| 344 |
+
"buggy_highlighted_code": _highlight_code(buggy, language="java") if buggy else "",
|
| 345 |
+
"fixed_code": fixed,
|
| 346 |
+
"fixed_highlighted_code": _highlight_code(fixed, language="java") if fixed else "",
|
| 347 |
+
"bug_category": "Code Refinement",
|
| 348 |
+
"bug_subtype": "",
|
| 349 |
+
"bug_explanation": "",
|
| 350 |
+
"language": "Java",
|
| 351 |
+
}
|
| 352 |
+
|
| 353 |
+
|
| 354 |
+
# ---------------------------------------------------------------------------
|
| 355 |
+
# CommitBench adapter (HuggingFace: Maxscha/commitbench)
|
| 356 |
+
# ---------------------------------------------------------------------------
|
| 357 |
+
|
| 358 |
+
|
| 359 |
+
class CommitBenchAdapter(DatasetAdapter):
|
| 360 |
+
slug = "commitbench"
|
| 361 |
+
display_name = "CommitBench"
|
| 362 |
+
has_ground_truth = False
|
| 363 |
+
has_tasks = False
|
| 364 |
+
|
| 365 |
+
def __init__(self, hf_dataset):
|
| 366 |
+
self._ds = hf_dataset
|
| 367 |
+
|
| 368 |
+
def problem_count(self) -> int:
|
| 369 |
+
return len(self._ds)
|
| 370 |
+
|
| 371 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 372 |
+
row = self._ds[idx]
|
| 373 |
+
return {
|
| 374 |
+
"idx": idx,
|
| 375 |
+
"task_id": row.get("hash", str(idx))[:12],
|
| 376 |
+
"entry_point": row.get("project", f"commit_{idx}"),
|
| 377 |
+
"num_inputs": 0,
|
| 378 |
+
"source": row.get("diff_languages", "unknown"),
|
| 379 |
+
}
|
| 380 |
+
|
| 381 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 382 |
+
row = self._ds[idx]
|
| 383 |
+
diff = row.get("diff", "")
|
| 384 |
+
message = row.get("message", "")
|
| 385 |
+
return {
|
| 386 |
+
"idx": idx,
|
| 387 |
+
"task_id": row.get("hash", str(idx))[:12],
|
| 388 |
+
"entry_point": row.get("project", f"commit_{idx}"),
|
| 389 |
+
"code": diff,
|
| 390 |
+
"highlighted_code": "",
|
| 391 |
+
"inputs": [],
|
| 392 |
+
"outputs": [],
|
| 393 |
+
"test": None,
|
| 394 |
+
"tasks": [],
|
| 395 |
+
"source": row.get("diff_languages", "unknown"),
|
| 396 |
+
"has_ground_truth": False,
|
| 397 |
+
"has_tasks": False,
|
| 398 |
+
"description": message,
|
| 399 |
+
"patch": diff,
|
| 400 |
+
"repo": row.get("project", ""),
|
| 401 |
+
"commit_hash": row.get("hash", ""),
|
| 402 |
+
"diff_languages": row.get("diff_languages", ""),
|
| 403 |
+
}
|
|
@@ -1,65 +1,24 @@
|
|
| 1 |
-
"""
|
| 2 |
-
Dataset adapters for the ML4SE Benchmark Viewer.
|
| 3 |
-
|
| 4 |
-
Each adapter normalises a different benchmark dataset into a common API shape
|
| 5 |
-
so the Flask routes and templates can handle them uniformly.
|
| 6 |
-
|
| 7 |
-
The REGISTRY dict maps slug strings (used in URLs) to adapter instances.
|
| 8 |
-
"""
|
| 9 |
|
| 10 |
from __future__ import annotations
|
| 11 |
|
| 12 |
import json
|
|
|
|
| 13 |
from typing import Any
|
| 14 |
|
| 15 |
-
|
|
|
|
|
|
|
| 16 |
_highlight_code = None
|
| 17 |
_code_offset = None
|
| 18 |
_extract_test_classes = None
|
| 19 |
|
| 20 |
|
| 21 |
-
def _set_helpers(highlight_code_fn, code_offset_fn, extract_test_classes_fn):
|
| 22 |
-
"""Called once by app.py to inject helper functions."""
|
| 23 |
-
global _highlight_code, _code_offset, _extract_test_classes
|
| 24 |
-
_highlight_code = highlight_code_fn
|
| 25 |
-
_code_offset = code_offset_fn
|
| 26 |
-
_extract_test_classes = extract_test_classes_fn
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
# ---------------------------------------------------------------------------
|
| 30 |
-
# Registry
|
| 31 |
-
# ---------------------------------------------------------------------------
|
| 32 |
-
|
| 33 |
-
REGISTRY: dict[str, "DatasetAdapter"] = {}
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
# ---------------------------------------------------------------------------
|
| 37 |
-
# Base class
|
| 38 |
-
# ---------------------------------------------------------------------------
|
| 39 |
-
|
| 40 |
-
class DatasetAdapter:
|
| 41 |
-
slug: str = ""
|
| 42 |
-
display_name: str = ""
|
| 43 |
-
has_ground_truth: bool = False
|
| 44 |
-
has_tasks: bool = False
|
| 45 |
-
|
| 46 |
-
def problem_count(self) -> int:
|
| 47 |
-
raise NotImplementedError
|
| 48 |
-
|
| 49 |
-
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 50 |
-
raise NotImplementedError
|
| 51 |
-
|
| 52 |
-
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 53 |
-
raise NotImplementedError
|
| 54 |
-
|
| 55 |
-
def get_ground_truth(self, idx: int, input_idx: int) -> dict[str, Any]:
|
| 56 |
-
return {"status": "unavailable", "message": "Ground truth not available for this dataset"}
|
| 57 |
-
|
| 58 |
-
|
| 59 |
# ---------------------------------------------------------------------------
|
| 60 |
# REval adapter (HuggingFace: JetBrains-Research/REval)
|
| 61 |
# ---------------------------------------------------------------------------
|
| 62 |
|
|
|
|
| 63 |
def _format_typed_value(val: dict) -> str:
|
| 64 |
"""Convert a {__type__, __value__} dict from REval states into a Python repr string."""
|
| 65 |
t = val.get("__type__")
|
|
@@ -85,11 +44,9 @@ class REvalAdapter(DatasetAdapter):
|
|
| 85 |
|
| 86 |
def __init__(self, problems_ds, tasks_ds, executions_ds, states_ds):
|
| 87 |
self._problems = problems_ds
|
| 88 |
-
# Build task lookup: task_id → parsed tasks JSON
|
| 89 |
self._tasks: dict[str, list] = {}
|
| 90 |
for row in tasks_ds:
|
| 91 |
self._tasks[row["task_id"]] = json.loads(row["tasks"])
|
| 92 |
-
# Build execution lookup: (task_id, input_idx) → row
|
| 93 |
self._executions: dict[tuple[str, int], dict] = {}
|
| 94 |
for row in executions_ds:
|
| 95 |
self._executions[(row["task_id"], row["input_idx"])] = {
|
|
@@ -97,7 +54,6 @@ class REvalAdapter(DatasetAdapter):
|
|
| 97 |
"trace": row["trace"],
|
| 98 |
"coverage": row["coverage"],
|
| 99 |
}
|
| 100 |
-
# Build states lookup: (task_id, input_idx) → parsed states JSON
|
| 101 |
self._states: dict[tuple[str, int], list] = {}
|
| 102 |
for row in states_ds:
|
| 103 |
self._states[(row["task_id"], row["input_idx"])] = json.loads(row["states"])
|
|
@@ -154,7 +110,7 @@ class REvalAdapter(DatasetAdapter):
|
|
| 154 |
for item in adjusted_items:
|
| 155 |
if "lineno" in item:
|
| 156 |
task_lines.add(item["lineno"])
|
| 157 |
-
task_info["task_lines"] = sorted(
|
| 158 |
|
| 159 |
tasks_info.append(task_info)
|
| 160 |
|
|
@@ -195,11 +151,9 @@ class REvalAdapter(DatasetAdapter):
|
|
| 195 |
code = problem["code"]
|
| 196 |
offset = _code_offset(code)
|
| 197 |
|
| 198 |
-
# Coverage: convert 0-indexed (original) → 1-indexed (stripped display)
|
| 199 |
coverage_1indexed = [ln + 1 - offset for ln in exec_rec["coverage"]]
|
| 200 |
total_lines = len(code[offset:].splitlines())
|
| 201 |
|
| 202 |
-
# Get task items for this input_idx
|
| 203 |
task_list = self._tasks.get(task_id, [])
|
| 204 |
task_items = []
|
| 205 |
for t in task_list:
|
|
@@ -207,15 +161,12 @@ class REvalAdapter(DatasetAdapter):
|
|
| 207 |
task_items = t.get("task", [])
|
| 208 |
break
|
| 209 |
|
| 210 |
-
# Get states for this (task_id, input_idx)
|
| 211 |
states_list = self._states.get((task_id, input_idx), [])
|
| 212 |
|
| 213 |
-
# Resolve variable answers for each task item
|
| 214 |
variable_answers = []
|
| 215 |
for item in task_items:
|
| 216 |
-
lineno = item["lineno"]
|
| 217 |
var = item["var"]
|
| 218 |
-
# Collect all values of this variable at this line across the trace
|
| 219 |
values = []
|
| 220 |
for s in states_list:
|
| 221 |
if s["lineno"] == lineno and var in s.get("locals", {}):
|
|
@@ -226,7 +177,6 @@ class REvalAdapter(DatasetAdapter):
|
|
| 226 |
elif len(values) == 1:
|
| 227 |
answer_str = _format_typed_value(values[0])
|
| 228 |
else:
|
| 229 |
-
# Deduplicate by formatted string to avoid showing identical values
|
| 230 |
seen = []
|
| 231 |
for v in values:
|
| 232 |
fmt = _format_typed_value(v)
|
|
@@ -234,13 +184,14 @@ class REvalAdapter(DatasetAdapter):
|
|
| 234 |
seen.append(fmt)
|
| 235 |
answer_str = "[" + ", ".join(seen) + "]" if len(seen) > 1 else seen[0]
|
| 236 |
|
| 237 |
-
variable_answers.append(
|
| 238 |
-
|
| 239 |
-
|
| 240 |
-
|
| 241 |
-
|
|
|
|
|
|
|
| 242 |
|
| 243 |
-
# Resolve next lines from trace for arrow visualization
|
| 244 |
trace = exec_rec["trace"]
|
| 245 |
next_lines_answers = []
|
| 246 |
processed_linenos: set[int] = set()
|
|
@@ -253,10 +204,12 @@ class REvalAdapter(DatasetAdapter):
|
|
| 253 |
for i, ln in enumerate(trace):
|
| 254 |
if ln == lineno and i + 1 < len(trace):
|
| 255 |
nexts.add(trace[i + 1])
|
| 256 |
-
next_lines_answers.append(
|
| 257 |
-
|
| 258 |
-
|
| 259 |
-
|
|
|
|
|
|
|
| 260 |
|
| 261 |
return {
|
| 262 |
"status": "ok",
|
|
@@ -267,72 +220,11 @@ class REvalAdapter(DatasetAdapter):
|
|
| 267 |
}
|
| 268 |
|
| 269 |
|
| 270 |
-
# ---------------------------------------------------------------------------
|
| 271 |
-
# CRUXEval adapter (HuggingFace: cruxeval-org/cruxeval)
|
| 272 |
-
# ---------------------------------------------------------------------------
|
| 273 |
-
|
| 274 |
-
class CRUXEvalAdapter(DatasetAdapter):
|
| 275 |
-
slug = "cruxeval"
|
| 276 |
-
display_name = "CRUXEval"
|
| 277 |
-
has_ground_truth = False
|
| 278 |
-
has_tasks = True
|
| 279 |
-
|
| 280 |
-
def __init__(self, hf_dataset):
|
| 281 |
-
self._ds = hf_dataset
|
| 282 |
-
|
| 283 |
-
def problem_count(self) -> int:
|
| 284 |
-
return len(self._ds)
|
| 285 |
-
|
| 286 |
-
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 287 |
-
row = self._ds[idx]
|
| 288 |
-
return {
|
| 289 |
-
"idx": idx,
|
| 290 |
-
"task_id": row["id"],
|
| 291 |
-
"entry_point": "f",
|
| 292 |
-
"num_inputs": 1,
|
| 293 |
-
"source": "CRUXEval",
|
| 294 |
-
}
|
| 295 |
-
|
| 296 |
-
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 297 |
-
row = self._ds[idx]
|
| 298 |
-
code = row["code"]
|
| 299 |
-
return {
|
| 300 |
-
"idx": idx,
|
| 301 |
-
"task_id": row["id"],
|
| 302 |
-
"entry_point": "f",
|
| 303 |
-
"code": code,
|
| 304 |
-
"highlighted_code": _highlight_code(code),
|
| 305 |
-
"inputs": [row["input"]],
|
| 306 |
-
"outputs": [row["output"]],
|
| 307 |
-
"test": None,
|
| 308 |
-
"tasks": [
|
| 309 |
-
{
|
| 310 |
-
"name": "Output Prediction",
|
| 311 |
-
"description": "Given the code and input, predict the output.",
|
| 312 |
-
"given": "input",
|
| 313 |
-
"predict": "output",
|
| 314 |
-
"input": row["input"],
|
| 315 |
-
"output": row["output"],
|
| 316 |
-
},
|
| 317 |
-
{
|
| 318 |
-
"name": "Input Prediction",
|
| 319 |
-
"description": "Given the code and output, predict the input.",
|
| 320 |
-
"given": "output",
|
| 321 |
-
"predict": "input",
|
| 322 |
-
"input": row["input"],
|
| 323 |
-
"output": row["output"],
|
| 324 |
-
},
|
| 325 |
-
],
|
| 326 |
-
"source": "CRUXEval",
|
| 327 |
-
"has_ground_truth": False,
|
| 328 |
-
"has_tasks": True,
|
| 329 |
-
}
|
| 330 |
-
|
| 331 |
-
|
| 332 |
# ---------------------------------------------------------------------------
|
| 333 |
# HumanEval+ adapter (HuggingFace: evalplus/humanevalplus)
|
| 334 |
# ---------------------------------------------------------------------------
|
| 335 |
|
|
|
|
| 336 |
class HumanEvalPlusAdapter(DatasetAdapter):
|
| 337 |
slug = "humanevalplus"
|
| 338 |
display_name = "HumanEval+"
|
|
@@ -378,6 +270,7 @@ class HumanEvalPlusAdapter(DatasetAdapter):
|
|
| 378 |
# BigOBench adapter (HuggingFace: facebook/BigOBench)
|
| 379 |
# ---------------------------------------------------------------------------
|
| 380 |
|
|
|
|
| 381 |
class BigOBenchAdapter(DatasetAdapter):
|
| 382 |
slug = "bigobench"
|
| 383 |
display_name = "BigOBench"
|
|
@@ -404,13 +297,15 @@ class BigOBenchAdapter(DatasetAdapter):
|
|
| 404 |
prob = self._problems[idx]
|
| 405 |
solutions = []
|
| 406 |
for sol in prob["solutions"]:
|
| 407 |
-
solutions.append(
|
| 408 |
-
|
| 409 |
-
|
| 410 |
-
|
| 411 |
-
|
| 412 |
-
|
| 413 |
-
|
|
|
|
|
|
|
| 414 |
return {
|
| 415 |
"idx": idx,
|
| 416 |
"task_id": prob["problem_id"],
|
|
@@ -429,16 +324,9 @@ class BigOBenchAdapter(DatasetAdapter):
|
|
| 429 |
}
|
| 430 |
|
| 431 |
|
| 432 |
-
def
|
| 433 |
-
"""Merge time and space complexity test sets by problem_id.
|
| 434 |
-
|
| 435 |
-
Groups all solutions under their parent problem. Solutions that appear
|
| 436 |
-
in both test sets get both complexity labels; otherwise the missing one
|
| 437 |
-
is None. Returns a list of problem dicts sorted by problem_id.
|
| 438 |
-
"""
|
| 439 |
-
# First, collect solutions keyed by (problem_id, solution_id)
|
| 440 |
solutions: dict[tuple[str, str], dict[str, Any]] = {}
|
| 441 |
-
# Track problem-level metadata
|
| 442 |
problem_meta: dict[str, dict[str, str]] = {}
|
| 443 |
|
| 444 |
for row in ds_time:
|
|
@@ -456,10 +344,13 @@ def _merge_bigobench(ds_time, ds_space) -> list[dict[str, Any]]:
|
|
| 456 |
|
| 457 |
for row in ds_space:
|
| 458 |
pid, sid = row["problem_id"], row["solution_id"]
|
| 459 |
-
problem_meta.setdefault(
|
| 460 |
-
|
| 461 |
-
|
| 462 |
-
|
|
|
|
|
|
|
|
|
|
| 463 |
key = (pid, sid)
|
| 464 |
if key in solutions:
|
| 465 |
solutions[key]["space_complexity"] = row["space_complexity_inferred"]
|
|
@@ -471,8 +362,6 @@ def _merge_bigobench(ds_time, ds_space) -> list[dict[str, Any]]:
|
|
| 471 |
"space_complexity": row["space_complexity_inferred"],
|
| 472 |
}
|
| 473 |
|
| 474 |
-
# Group solutions by problem_id
|
| 475 |
-
from collections import defaultdict
|
| 476 |
by_problem: dict[str, list[dict[str, Any]]] = defaultdict(list)
|
| 477 |
for (pid, _sid), sol in solutions.items():
|
| 478 |
by_problem[pid].append(sol)
|
|
@@ -480,58 +369,537 @@ def _merge_bigobench(ds_time, ds_space) -> list[dict[str, Any]]:
|
|
| 480 |
problems = []
|
| 481 |
for pid in sorted(by_problem.keys()):
|
| 482 |
meta = problem_meta[pid]
|
| 483 |
-
problems.append(
|
| 484 |
-
|
| 485 |
-
|
| 486 |
-
|
| 487 |
-
|
| 488 |
-
|
|
|
|
|
|
|
| 489 |
|
| 490 |
return problems
|
| 491 |
|
| 492 |
|
| 493 |
# ---------------------------------------------------------------------------
|
| 494 |
-
#
|
| 495 |
# ---------------------------------------------------------------------------
|
| 496 |
|
| 497 |
-
|
| 498 |
-
|
| 499 |
-
|
| 500 |
-
|
| 501 |
-
|
| 502 |
-
|
| 503 |
-
|
| 504 |
-
|
| 505 |
-
|
| 506 |
-
|
| 507 |
-
|
| 508 |
-
|
| 509 |
-
|
| 510 |
-
|
| 511 |
-
|
| 512 |
-
|
| 513 |
-
|
| 514 |
-
|
| 515 |
-
|
| 516 |
-
|
| 517 |
-
|
| 518 |
-
|
| 519 |
-
|
| 520 |
-
|
| 521 |
-
|
| 522 |
-
|
| 523 |
-
|
| 524 |
-
|
| 525 |
-
|
| 526 |
-
|
| 527 |
-
"
|
| 528 |
-
|
| 529 |
-
|
| 530 |
-
"
|
| 531 |
-
|
| 532 |
-
|
| 533 |
-
|
| 534 |
-
|
| 535 |
-
|
| 536 |
-
|
| 537 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Code generation benchmark adapters."""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
from __future__ import annotations
|
| 4 |
|
| 5 |
import json
|
| 6 |
+
from collections import defaultdict
|
| 7 |
from typing import Any
|
| 8 |
|
| 9 |
+
from adapters import DatasetAdapter
|
| 10 |
+
|
| 11 |
+
# Injected at runtime by _set_helpers()
|
| 12 |
_highlight_code = None
|
| 13 |
_code_offset = None
|
| 14 |
_extract_test_classes = None
|
| 15 |
|
| 16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
# ---------------------------------------------------------------------------
|
| 18 |
# REval adapter (HuggingFace: JetBrains-Research/REval)
|
| 19 |
# ---------------------------------------------------------------------------
|
| 20 |
|
| 21 |
+
|
| 22 |
def _format_typed_value(val: dict) -> str:
|
| 23 |
"""Convert a {__type__, __value__} dict from REval states into a Python repr string."""
|
| 24 |
t = val.get("__type__")
|
|
|
|
| 44 |
|
| 45 |
def __init__(self, problems_ds, tasks_ds, executions_ds, states_ds):
|
| 46 |
self._problems = problems_ds
|
|
|
|
| 47 |
self._tasks: dict[str, list] = {}
|
| 48 |
for row in tasks_ds:
|
| 49 |
self._tasks[row["task_id"]] = json.loads(row["tasks"])
|
|
|
|
| 50 |
self._executions: dict[tuple[str, int], dict] = {}
|
| 51 |
for row in executions_ds:
|
| 52 |
self._executions[(row["task_id"], row["input_idx"])] = {
|
|
|
|
| 54 |
"trace": row["trace"],
|
| 55 |
"coverage": row["coverage"],
|
| 56 |
}
|
|
|
|
| 57 |
self._states: dict[tuple[str, int], list] = {}
|
| 58 |
for row in states_ds:
|
| 59 |
self._states[(row["task_id"], row["input_idx"])] = json.loads(row["states"])
|
|
|
|
| 110 |
for item in adjusted_items:
|
| 111 |
if "lineno" in item:
|
| 112 |
task_lines.add(item["lineno"])
|
| 113 |
+
task_info["task_lines"] = sorted(task_lines)
|
| 114 |
|
| 115 |
tasks_info.append(task_info)
|
| 116 |
|
|
|
|
| 151 |
code = problem["code"]
|
| 152 |
offset = _code_offset(code)
|
| 153 |
|
|
|
|
| 154 |
coverage_1indexed = [ln + 1 - offset for ln in exec_rec["coverage"]]
|
| 155 |
total_lines = len(code[offset:].splitlines())
|
| 156 |
|
|
|
|
| 157 |
task_list = self._tasks.get(task_id, [])
|
| 158 |
task_items = []
|
| 159 |
for t in task_list:
|
|
|
|
| 161 |
task_items = t.get("task", [])
|
| 162 |
break
|
| 163 |
|
|
|
|
| 164 |
states_list = self._states.get((task_id, input_idx), [])
|
| 165 |
|
|
|
|
| 166 |
variable_answers = []
|
| 167 |
for item in task_items:
|
| 168 |
+
lineno = item["lineno"]
|
| 169 |
var = item["var"]
|
|
|
|
| 170 |
values = []
|
| 171 |
for s in states_list:
|
| 172 |
if s["lineno"] == lineno and var in s.get("locals", {}):
|
|
|
|
| 177 |
elif len(values) == 1:
|
| 178 |
answer_str = _format_typed_value(values[0])
|
| 179 |
else:
|
|
|
|
| 180 |
seen = []
|
| 181 |
for v in values:
|
| 182 |
fmt = _format_typed_value(v)
|
|
|
|
| 184 |
seen.append(fmt)
|
| 185 |
answer_str = "[" + ", ".join(seen) + "]" if len(seen) > 1 else seen[0]
|
| 186 |
|
| 187 |
+
variable_answers.append(
|
| 188 |
+
{
|
| 189 |
+
"lineno": lineno - offset,
|
| 190 |
+
"var": var,
|
| 191 |
+
"answer_str": answer_str,
|
| 192 |
+
}
|
| 193 |
+
)
|
| 194 |
|
|
|
|
| 195 |
trace = exec_rec["trace"]
|
| 196 |
next_lines_answers = []
|
| 197 |
processed_linenos: set[int] = set()
|
|
|
|
| 204 |
for i, ln in enumerate(trace):
|
| 205 |
if ln == lineno and i + 1 < len(trace):
|
| 206 |
nexts.add(trace[i + 1])
|
| 207 |
+
next_lines_answers.append(
|
| 208 |
+
{
|
| 209 |
+
"lineno": lineno,
|
| 210 |
+
"next_lines": sorted(nexts) if nexts else [-1],
|
| 211 |
+
}
|
| 212 |
+
)
|
| 213 |
|
| 214 |
return {
|
| 215 |
"status": "ok",
|
|
|
|
| 220 |
}
|
| 221 |
|
| 222 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 223 |
# ---------------------------------------------------------------------------
|
| 224 |
# HumanEval+ adapter (HuggingFace: evalplus/humanevalplus)
|
| 225 |
# ---------------------------------------------------------------------------
|
| 226 |
|
| 227 |
+
|
| 228 |
class HumanEvalPlusAdapter(DatasetAdapter):
|
| 229 |
slug = "humanevalplus"
|
| 230 |
display_name = "HumanEval+"
|
|
|
|
| 270 |
# BigOBench adapter (HuggingFace: facebook/BigOBench)
|
| 271 |
# ---------------------------------------------------------------------------
|
| 272 |
|
| 273 |
+
|
| 274 |
class BigOBenchAdapter(DatasetAdapter):
|
| 275 |
slug = "bigobench"
|
| 276 |
display_name = "BigOBench"
|
|
|
|
| 297 |
prob = self._problems[idx]
|
| 298 |
solutions = []
|
| 299 |
for sol in prob["solutions"]:
|
| 300 |
+
solutions.append(
|
| 301 |
+
{
|
| 302 |
+
"solution_id": sol["solution_id"],
|
| 303 |
+
"code": sol["solution_code"],
|
| 304 |
+
"highlighted_code": _highlight_code(sol["solution_code"]),
|
| 305 |
+
"time_complexity": sol.get("time_complexity"),
|
| 306 |
+
"space_complexity": sol.get("space_complexity"),
|
| 307 |
+
}
|
| 308 |
+
)
|
| 309 |
return {
|
| 310 |
"idx": idx,
|
| 311 |
"task_id": prob["problem_id"],
|
|
|
|
| 324 |
}
|
| 325 |
|
| 326 |
|
| 327 |
+
def merge_bigobench(ds_time, ds_space) -> list[dict[str, Any]]:
|
| 328 |
+
"""Merge time and space complexity test sets by problem_id."""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 329 |
solutions: dict[tuple[str, str], dict[str, Any]] = {}
|
|
|
|
| 330 |
problem_meta: dict[str, dict[str, str]] = {}
|
| 331 |
|
| 332 |
for row in ds_time:
|
|
|
|
| 344 |
|
| 345 |
for row in ds_space:
|
| 346 |
pid, sid = row["problem_id"], row["solution_id"]
|
| 347 |
+
problem_meta.setdefault(
|
| 348 |
+
pid,
|
| 349 |
+
{
|
| 350 |
+
"problem_name": row["problem_name"],
|
| 351 |
+
"description": row["description"],
|
| 352 |
+
},
|
| 353 |
+
)
|
| 354 |
key = (pid, sid)
|
| 355 |
if key in solutions:
|
| 356 |
solutions[key]["space_complexity"] = row["space_complexity_inferred"]
|
|
|
|
| 362 |
"space_complexity": row["space_complexity_inferred"],
|
| 363 |
}
|
| 364 |
|
|
|
|
|
|
|
| 365 |
by_problem: dict[str, list[dict[str, Any]]] = defaultdict(list)
|
| 366 |
for (pid, _sid), sol in solutions.items():
|
| 367 |
by_problem[pid].append(sol)
|
|
|
|
| 369 |
problems = []
|
| 370 |
for pid in sorted(by_problem.keys()):
|
| 371 |
meta = problem_meta[pid]
|
| 372 |
+
problems.append(
|
| 373 |
+
{
|
| 374 |
+
"problem_id": pid,
|
| 375 |
+
"problem_name": meta["problem_name"],
|
| 376 |
+
"description": meta["description"],
|
| 377 |
+
"solutions": by_problem[pid],
|
| 378 |
+
}
|
| 379 |
+
)
|
| 380 |
|
| 381 |
return problems
|
| 382 |
|
| 383 |
|
| 384 |
# ---------------------------------------------------------------------------
|
| 385 |
+
# MBPP+ adapter (HuggingFace: evalplus/mbppplus)
|
| 386 |
# ---------------------------------------------------------------------------
|
| 387 |
|
| 388 |
+
|
| 389 |
+
class MBPPPlusAdapter(DatasetAdapter):
|
| 390 |
+
slug = "mbppplus"
|
| 391 |
+
display_name = "MBPP+"
|
| 392 |
+
has_ground_truth = False
|
| 393 |
+
has_tasks = False
|
| 394 |
+
|
| 395 |
+
def __init__(self, hf_dataset):
|
| 396 |
+
self._ds = hf_dataset
|
| 397 |
+
|
| 398 |
+
def problem_count(self) -> int:
|
| 399 |
+
return len(self._ds)
|
| 400 |
+
|
| 401 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 402 |
+
row = self._ds[idx]
|
| 403 |
+
return {
|
| 404 |
+
"idx": idx,
|
| 405 |
+
"task_id": str(row["task_id"]),
|
| 406 |
+
"entry_point": row["prompt"][:60].replace("\n", " ").strip(),
|
| 407 |
+
"num_inputs": len(row["test_list"]),
|
| 408 |
+
"source": "MBPP+",
|
| 409 |
+
}
|
| 410 |
+
|
| 411 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 412 |
+
row = self._ds[idx]
|
| 413 |
+
code = row["code"]
|
| 414 |
+
return {
|
| 415 |
+
"idx": idx,
|
| 416 |
+
"task_id": str(row["task_id"]),
|
| 417 |
+
"entry_point": row["prompt"][:60].replace("\n", " ").strip(),
|
| 418 |
+
"code": code,
|
| 419 |
+
"highlighted_code": _highlight_code(code),
|
| 420 |
+
"inputs": [],
|
| 421 |
+
"outputs": [],
|
| 422 |
+
"test": "\n".join(row["test_list"]),
|
| 423 |
+
"tasks": [],
|
| 424 |
+
"source": "MBPP+",
|
| 425 |
+
"has_ground_truth": False,
|
| 426 |
+
"has_tasks": False,
|
| 427 |
+
"description": row["prompt"],
|
| 428 |
+
}
|
| 429 |
+
|
| 430 |
+
|
| 431 |
+
# ---------------------------------------------------------------------------
|
| 432 |
+
# ClassEval adapter (HuggingFace: FudanSELab/ClassEval)
|
| 433 |
+
# ---------------------------------------------------------------------------
|
| 434 |
+
|
| 435 |
+
|
| 436 |
+
class ClassEvalAdapter(DatasetAdapter):
|
| 437 |
+
slug = "classeval"
|
| 438 |
+
display_name = "ClassEval"
|
| 439 |
+
has_ground_truth = False
|
| 440 |
+
has_tasks = False
|
| 441 |
+
|
| 442 |
+
def __init__(self, hf_dataset):
|
| 443 |
+
self._ds = hf_dataset
|
| 444 |
+
|
| 445 |
+
def problem_count(self) -> int:
|
| 446 |
+
return len(self._ds)
|
| 447 |
+
|
| 448 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 449 |
+
row = self._ds[idx]
|
| 450 |
+
return {
|
| 451 |
+
"idx": idx,
|
| 452 |
+
"task_id": row["task_id"],
|
| 453 |
+
"entry_point": row["class_name"],
|
| 454 |
+
"num_inputs": len(row["methods_info"]),
|
| 455 |
+
"source": "ClassEval",
|
| 456 |
+
}
|
| 457 |
+
|
| 458 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 459 |
+
row = self._ds[idx]
|
| 460 |
+
code = row["solution_code"]
|
| 461 |
+
return {
|
| 462 |
+
"idx": idx,
|
| 463 |
+
"task_id": row["task_id"],
|
| 464 |
+
"entry_point": row["class_name"],
|
| 465 |
+
"code": code,
|
| 466 |
+
"highlighted_code": _highlight_code(code),
|
| 467 |
+
"inputs": [],
|
| 468 |
+
"outputs": [],
|
| 469 |
+
"test": row["test"],
|
| 470 |
+
"tasks": [],
|
| 471 |
+
"source": "ClassEval",
|
| 472 |
+
"has_ground_truth": False,
|
| 473 |
+
"has_tasks": False,
|
| 474 |
+
"description": row["class_description"],
|
| 475 |
+
"skeleton": row["skeleton"],
|
| 476 |
+
}
|
| 477 |
+
|
| 478 |
+
|
| 479 |
+
# ---------------------------------------------------------------------------
|
| 480 |
+
# LiveCodeBench adapter (HuggingFace: livecodebench/code_generation_lite)
|
| 481 |
+
# ---------------------------------------------------------------------------
|
| 482 |
+
|
| 483 |
+
|
| 484 |
+
class LiveCodeBenchAdapter(DatasetAdapter):
|
| 485 |
+
slug = "livecodebench"
|
| 486 |
+
display_name = "LiveCodeBench"
|
| 487 |
+
has_ground_truth = False
|
| 488 |
+
has_tasks = False
|
| 489 |
+
|
| 490 |
+
def __init__(self, hf_dataset):
|
| 491 |
+
self._ds = hf_dataset
|
| 492 |
+
|
| 493 |
+
def problem_count(self) -> int:
|
| 494 |
+
return len(self._ds)
|
| 495 |
+
|
| 496 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 497 |
+
row = self._ds[idx]
|
| 498 |
+
return {
|
| 499 |
+
"idx": idx,
|
| 500 |
+
"task_id": row["question_id"],
|
| 501 |
+
"entry_point": row["question_title"],
|
| 502 |
+
"num_inputs": 0,
|
| 503 |
+
"source": row["platform"],
|
| 504 |
+
}
|
| 505 |
+
|
| 506 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 507 |
+
row = self._ds[idx]
|
| 508 |
+
test_cases = []
|
| 509 |
+
try:
|
| 510 |
+
test_cases = json.loads(row["public_test_cases"]) if row["public_test_cases"] else []
|
| 511 |
+
except (json.JSONDecodeError, TypeError):
|
| 512 |
+
pass
|
| 513 |
+
|
| 514 |
+
inputs = [tc.get("input", "") for tc in test_cases]
|
| 515 |
+
outputs = [tc.get("output", "") for tc in test_cases]
|
| 516 |
+
|
| 517 |
+
starter = row.get("starter_code", "") or ""
|
| 518 |
+
code = starter if starter.strip() else ""
|
| 519 |
+
|
| 520 |
+
return {
|
| 521 |
+
"idx": idx,
|
| 522 |
+
"task_id": row["question_id"],
|
| 523 |
+
"entry_point": row["question_title"],
|
| 524 |
+
"code": code,
|
| 525 |
+
"highlighted_code": _highlight_code(code) if code else "",
|
| 526 |
+
"inputs": inputs,
|
| 527 |
+
"outputs": outputs,
|
| 528 |
+
"test": None,
|
| 529 |
+
"tasks": [],
|
| 530 |
+
"source": row["platform"],
|
| 531 |
+
"has_ground_truth": False,
|
| 532 |
+
"has_tasks": False,
|
| 533 |
+
"description": row["question_content"],
|
| 534 |
+
"difficulty": row.get("difficulty", ""),
|
| 535 |
+
"contest_date": row.get("contest_date", ""),
|
| 536 |
+
}
|
| 537 |
+
|
| 538 |
+
|
| 539 |
+
# ---------------------------------------------------------------------------
|
| 540 |
+
# CodeContests adapter (HuggingFace: deepmind/code_contests)
|
| 541 |
+
# ---------------------------------------------------------------------------
|
| 542 |
+
|
| 543 |
+
_CC_LANG_NAMES = {0: "Unknown", 1: "Python 2", 2: "C++", 3: "Python 3", 4: "Java"}
|
| 544 |
+
|
| 545 |
+
|
| 546 |
+
class CodeContestsAdapter(DatasetAdapter):
|
| 547 |
+
slug = "codecontests"
|
| 548 |
+
display_name = "CodeContests"
|
| 549 |
+
has_ground_truth = False
|
| 550 |
+
has_tasks = False
|
| 551 |
+
|
| 552 |
+
_DIFFICULTY_NAMES = {
|
| 553 |
+
0: "Unknown",
|
| 554 |
+
1: "Easy",
|
| 555 |
+
2: "Medium",
|
| 556 |
+
3: "Hard",
|
| 557 |
+
4: "Harder",
|
| 558 |
+
5: "Hardest",
|
| 559 |
+
6: "External",
|
| 560 |
+
}
|
| 561 |
+
_SOURCE_NAMES = {
|
| 562 |
+
0: "Unknown",
|
| 563 |
+
1: "CodeChef",
|
| 564 |
+
2: "Codeforces",
|
| 565 |
+
3: "HackerEarth",
|
| 566 |
+
4: "CodeJam",
|
| 567 |
+
5: "AtCoder",
|
| 568 |
+
6: "Aizu",
|
| 569 |
+
}
|
| 570 |
+
|
| 571 |
+
def __init__(self, hf_dataset):
|
| 572 |
+
self._ds = hf_dataset
|
| 573 |
+
|
| 574 |
+
def problem_count(self) -> int:
|
| 575 |
+
return len(self._ds)
|
| 576 |
+
|
| 577 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 578 |
+
row = self._ds[idx]
|
| 579 |
+
source_int = row.get("source", 0)
|
| 580 |
+
source_name = self._SOURCE_NAMES.get(source_int, "Unknown")
|
| 581 |
+
return {
|
| 582 |
+
"idx": idx,
|
| 583 |
+
"task_id": row["name"],
|
| 584 |
+
"entry_point": row["name"],
|
| 585 |
+
"num_inputs": len(row.get("public_tests", {}).get("input", [])),
|
| 586 |
+
"source": source_name,
|
| 587 |
+
}
|
| 588 |
+
|
| 589 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 590 |
+
row = self._ds[idx]
|
| 591 |
+
source_int = row.get("source", 0)
|
| 592 |
+
source_name = self._SOURCE_NAMES.get(source_int, "Unknown")
|
| 593 |
+
diff_int = row.get("difficulty", 0)
|
| 594 |
+
diff_name = self._DIFFICULTY_NAMES.get(diff_int, "Unknown")
|
| 595 |
+
|
| 596 |
+
sols_data = row.get("solutions", {})
|
| 597 |
+
sol_langs = sols_data.get("language", [])
|
| 598 |
+
sol_codes = sols_data.get("solution", [])
|
| 599 |
+
solutions = []
|
| 600 |
+
for i, code in enumerate(sol_codes[:10]):
|
| 601 |
+
lang_int = sol_langs[i] if i < len(sol_langs) else 0
|
| 602 |
+
lang_name = _CC_LANG_NAMES.get(lang_int, "Unknown")
|
| 603 |
+
lang_key = {1: "python", 2: "cpp", 3: "python", 4: "java"}.get(lang_int, "python")
|
| 604 |
+
solutions.append(
|
| 605 |
+
{
|
| 606 |
+
"solution_id": f"sol_{i}",
|
| 607 |
+
"code": code,
|
| 608 |
+
"highlighted_code": _highlight_code(code, language=lang_key),
|
| 609 |
+
"language": lang_name,
|
| 610 |
+
}
|
| 611 |
+
)
|
| 612 |
+
|
| 613 |
+
pub_tests = row.get("public_tests", {})
|
| 614 |
+
inputs = pub_tests.get("input", [])
|
| 615 |
+
outputs = pub_tests.get("output", [])
|
| 616 |
+
tags = list(row.get("cf_tags", []))
|
| 617 |
+
|
| 618 |
+
return {
|
| 619 |
+
"idx": idx,
|
| 620 |
+
"task_id": row["name"],
|
| 621 |
+
"entry_point": row["name"],
|
| 622 |
+
"code": solutions[0]["code"] if solutions else "",
|
| 623 |
+
"highlighted_code": solutions[0]["highlighted_code"] if solutions else "",
|
| 624 |
+
"inputs": inputs,
|
| 625 |
+
"outputs": outputs,
|
| 626 |
+
"test": None,
|
| 627 |
+
"tasks": [],
|
| 628 |
+
"source": source_name,
|
| 629 |
+
"has_ground_truth": False,
|
| 630 |
+
"has_tasks": False,
|
| 631 |
+
"description": row["description"],
|
| 632 |
+
"difficulty": diff_name,
|
| 633 |
+
"solutions": solutions,
|
| 634 |
+
"cf_rating": row.get("cf_rating", 0),
|
| 635 |
+
"tags": tags,
|
| 636 |
+
}
|
| 637 |
+
|
| 638 |
+
|
| 639 |
+
# ---------------------------------------------------------------------------
|
| 640 |
+
# APPS adapter (HuggingFace: codeparrot/apps)
|
| 641 |
+
# ---------------------------------------------------------------------------
|
| 642 |
+
|
| 643 |
+
|
| 644 |
+
class APPSAdapter(DatasetAdapter):
|
| 645 |
+
slug = "apps"
|
| 646 |
+
display_name = "APPS"
|
| 647 |
+
has_ground_truth = False
|
| 648 |
+
has_tasks = False
|
| 649 |
+
|
| 650 |
+
def __init__(self, hf_dataset):
|
| 651 |
+
self._ds = hf_dataset
|
| 652 |
+
|
| 653 |
+
def problem_count(self) -> int:
|
| 654 |
+
return len(self._ds)
|
| 655 |
+
|
| 656 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 657 |
+
row = self._ds[idx]
|
| 658 |
+
return {
|
| 659 |
+
"idx": idx,
|
| 660 |
+
"task_id": str(row["problem_id"]),
|
| 661 |
+
"entry_point": row["question"][:60].replace("\n", " ").strip(),
|
| 662 |
+
"num_inputs": 0,
|
| 663 |
+
"source": row.get("difficulty", "unknown"),
|
| 664 |
+
}
|
| 665 |
+
|
| 666 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 667 |
+
row = self._ds[idx]
|
| 668 |
+
solutions = []
|
| 669 |
+
if row.get("solutions"):
|
| 670 |
+
try:
|
| 671 |
+
sol_list = json.loads(row["solutions"])
|
| 672 |
+
for i, code in enumerate(sol_list[:5]):
|
| 673 |
+
solutions.append(
|
| 674 |
+
{
|
| 675 |
+
"solution_id": f"sol_{i}",
|
| 676 |
+
"code": code,
|
| 677 |
+
"highlighted_code": _highlight_code(code),
|
| 678 |
+
}
|
| 679 |
+
)
|
| 680 |
+
except (json.JSONDecodeError, TypeError):
|
| 681 |
+
pass
|
| 682 |
+
|
| 683 |
+
inputs, outputs = [], []
|
| 684 |
+
if row.get("input_output"):
|
| 685 |
+
try:
|
| 686 |
+
io = json.loads(row["input_output"])
|
| 687 |
+
inputs = io.get("inputs", [])
|
| 688 |
+
outputs = io.get("outputs", [])
|
| 689 |
+
except (json.JSONDecodeError, TypeError):
|
| 690 |
+
pass
|
| 691 |
+
|
| 692 |
+
code = solutions[0]["code"] if solutions else (row.get("starter_code") or "")
|
| 693 |
+
return {
|
| 694 |
+
"idx": idx,
|
| 695 |
+
"task_id": str(row["problem_id"]),
|
| 696 |
+
"entry_point": row["question"][:60].replace("\n", " ").strip(),
|
| 697 |
+
"code": code,
|
| 698 |
+
"highlighted_code": _highlight_code(code) if code else "",
|
| 699 |
+
"inputs": inputs[:5],
|
| 700 |
+
"outputs": outputs[:5],
|
| 701 |
+
"test": None,
|
| 702 |
+
"tasks": [],
|
| 703 |
+
"source": row.get("difficulty", "unknown"),
|
| 704 |
+
"has_ground_truth": False,
|
| 705 |
+
"has_tasks": False,
|
| 706 |
+
"description": row["question"],
|
| 707 |
+
"difficulty": row.get("difficulty", ""),
|
| 708 |
+
"solutions": solutions if len(solutions) > 1 else [],
|
| 709 |
+
"url": row.get("url", ""),
|
| 710 |
+
"starter_code": row.get("starter_code", ""),
|
| 711 |
+
}
|
| 712 |
+
|
| 713 |
+
|
| 714 |
+
# ---------------------------------------------------------------------------
|
| 715 |
+
# MBPP adapter (HuggingFace: google-research-datasets/mbpp)
|
| 716 |
+
# ---------------------------------------------------------------------------
|
| 717 |
+
|
| 718 |
+
|
| 719 |
+
class MBPPAdapter(DatasetAdapter):
|
| 720 |
+
slug = "mbpp"
|
| 721 |
+
display_name = "MBPP"
|
| 722 |
+
has_ground_truth = False
|
| 723 |
+
has_tasks = False
|
| 724 |
+
|
| 725 |
+
def __init__(self, hf_dataset):
|
| 726 |
+
self._ds = hf_dataset
|
| 727 |
+
|
| 728 |
+
def problem_count(self) -> int:
|
| 729 |
+
return len(self._ds)
|
| 730 |
+
|
| 731 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 732 |
+
row = self._ds[idx]
|
| 733 |
+
return {
|
| 734 |
+
"idx": idx,
|
| 735 |
+
"task_id": str(row["task_id"]),
|
| 736 |
+
"entry_point": row["text"][:60].replace("\n", " ").strip(),
|
| 737 |
+
"num_inputs": len(row.get("test_list", [])),
|
| 738 |
+
"source": "MBPP",
|
| 739 |
+
}
|
| 740 |
+
|
| 741 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 742 |
+
row = self._ds[idx]
|
| 743 |
+
code = row["code"]
|
| 744 |
+
test_list = row.get("test_list", [])
|
| 745 |
+
challenge_tests = row.get("challenge_test_list", [])
|
| 746 |
+
all_tests = test_list + challenge_tests
|
| 747 |
+
return {
|
| 748 |
+
"idx": idx,
|
| 749 |
+
"task_id": str(row["task_id"]),
|
| 750 |
+
"entry_point": row["text"][:60].replace("\n", " ").strip(),
|
| 751 |
+
"code": code,
|
| 752 |
+
"highlighted_code": _highlight_code(code),
|
| 753 |
+
"inputs": [],
|
| 754 |
+
"outputs": [],
|
| 755 |
+
"test": "\n".join(all_tests),
|
| 756 |
+
"tasks": [],
|
| 757 |
+
"source": "MBPP",
|
| 758 |
+
"has_ground_truth": False,
|
| 759 |
+
"has_tasks": False,
|
| 760 |
+
"description": row["text"],
|
| 761 |
+
}
|
| 762 |
+
|
| 763 |
+
|
| 764 |
+
# ---------------------------------------------------------------------------
|
| 765 |
+
# CodeSearchNet adapter (HuggingFace: code-search-net/code_search_net)
|
| 766 |
+
# ---------------------------------------------------------------------------
|
| 767 |
+
|
| 768 |
+
|
| 769 |
+
class CodeSearchNetAdapter(DatasetAdapter):
|
| 770 |
+
slug = "codesearchnet"
|
| 771 |
+
display_name = "CodeSearchNet"
|
| 772 |
+
has_ground_truth = False
|
| 773 |
+
has_tasks = False
|
| 774 |
+
|
| 775 |
+
def __init__(self, hf_dataset):
|
| 776 |
+
self._ds = hf_dataset
|
| 777 |
+
|
| 778 |
+
def problem_count(self) -> int:
|
| 779 |
+
return len(self._ds)
|
| 780 |
+
|
| 781 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 782 |
+
row = self._ds[idx]
|
| 783 |
+
return {
|
| 784 |
+
"idx": idx,
|
| 785 |
+
"task_id": row.get("func_name", str(idx)),
|
| 786 |
+
"entry_point": row.get("func_name", f"csn_{idx}"),
|
| 787 |
+
"num_inputs": 0,
|
| 788 |
+
"source": row.get("language", "unknown"),
|
| 789 |
+
}
|
| 790 |
+
|
| 791 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 792 |
+
row = self._ds[idx]
|
| 793 |
+
code = row.get("func_code_string", "")
|
| 794 |
+
lang = row.get("language", "python")
|
| 795 |
+
return {
|
| 796 |
+
"idx": idx,
|
| 797 |
+
"task_id": row.get("func_name", str(idx)),
|
| 798 |
+
"entry_point": row.get("func_name", f"csn_{idx}"),
|
| 799 |
+
"code": code,
|
| 800 |
+
"highlighted_code": _highlight_code(code, language=lang),
|
| 801 |
+
"inputs": [],
|
| 802 |
+
"outputs": [],
|
| 803 |
+
"test": None,
|
| 804 |
+
"tasks": [],
|
| 805 |
+
"source": lang,
|
| 806 |
+
"has_ground_truth": False,
|
| 807 |
+
"has_tasks": False,
|
| 808 |
+
"description": row.get("func_documentation_string", ""),
|
| 809 |
+
}
|
| 810 |
+
|
| 811 |
+
|
| 812 |
+
# ---------------------------------------------------------------------------
|
| 813 |
+
# BigCodeBench adapter (HuggingFace: bigcode/bigcodebench)
|
| 814 |
+
# ---------------------------------------------------------------------------
|
| 815 |
+
|
| 816 |
+
|
| 817 |
+
class BigCodeBenchAdapter(DatasetAdapter):
|
| 818 |
+
slug = "bigcodebench"
|
| 819 |
+
display_name = "BigCodeBench"
|
| 820 |
+
has_ground_truth = False
|
| 821 |
+
has_tasks = False
|
| 822 |
+
|
| 823 |
+
def __init__(self, hf_dataset):
|
| 824 |
+
self._ds = hf_dataset
|
| 825 |
+
|
| 826 |
+
def problem_count(self) -> int:
|
| 827 |
+
return len(self._ds)
|
| 828 |
+
|
| 829 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 830 |
+
row = self._ds[idx]
|
| 831 |
+
return {
|
| 832 |
+
"idx": idx,
|
| 833 |
+
"task_id": row["task_id"],
|
| 834 |
+
"entry_point": row.get("entry_point", "task_func"),
|
| 835 |
+
"num_inputs": 0,
|
| 836 |
+
"source": "BigCodeBench",
|
| 837 |
+
}
|
| 838 |
+
|
| 839 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 840 |
+
row = self._ds[idx]
|
| 841 |
+
code = row.get("code_prompt", "") + row.get("canonical_solution", "")
|
| 842 |
+
libs = row.get("libs", "")
|
| 843 |
+
return {
|
| 844 |
+
"idx": idx,
|
| 845 |
+
"task_id": row["task_id"],
|
| 846 |
+
"entry_point": row.get("entry_point", "task_func"),
|
| 847 |
+
"code": code,
|
| 848 |
+
"highlighted_code": _highlight_code(code),
|
| 849 |
+
"inputs": [],
|
| 850 |
+
"outputs": [],
|
| 851 |
+
"test": row.get("test", ""),
|
| 852 |
+
"tasks": [],
|
| 853 |
+
"source": "BigCodeBench",
|
| 854 |
+
"has_ground_truth": False,
|
| 855 |
+
"has_tasks": False,
|
| 856 |
+
"description": row.get("complete_prompt", ""),
|
| 857 |
+
"libs": libs,
|
| 858 |
+
}
|
| 859 |
+
|
| 860 |
+
|
| 861 |
+
# ---------------------------------------------------------------------------
|
| 862 |
+
# EffiBench adapter (HuggingFace: DONG19/EffiBench)
|
| 863 |
+
# ---------------------------------------------------------------------------
|
| 864 |
+
|
| 865 |
+
|
| 866 |
+
class EffiBenchAdapter(DatasetAdapter):
|
| 867 |
+
slug = "effibench"
|
| 868 |
+
display_name = "EffiBench"
|
| 869 |
+
has_ground_truth = False
|
| 870 |
+
has_tasks = False
|
| 871 |
+
|
| 872 |
+
def __init__(self, hf_dataset):
|
| 873 |
+
self._ds = hf_dataset
|
| 874 |
+
|
| 875 |
+
def problem_count(self) -> int:
|
| 876 |
+
return len(self._ds)
|
| 877 |
+
|
| 878 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 879 |
+
row = self._ds[idx]
|
| 880 |
+
return {
|
| 881 |
+
"idx": idx,
|
| 882 |
+
"task_id": str(row.get("problem_idx", idx)),
|
| 883 |
+
"entry_point": row.get("task_name", f"effibench_{idx}"),
|
| 884 |
+
"num_inputs": 0,
|
| 885 |
+
"source": "EffiBench",
|
| 886 |
+
}
|
| 887 |
+
|
| 888 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 889 |
+
row = self._ds[idx]
|
| 890 |
+
code = row.get("canonical_solution", "")
|
| 891 |
+
return {
|
| 892 |
+
"idx": idx,
|
| 893 |
+
"task_id": str(row.get("problem_idx", idx)),
|
| 894 |
+
"entry_point": row.get("task_name", f"effibench_{idx}"),
|
| 895 |
+
"code": code,
|
| 896 |
+
"highlighted_code": _highlight_code(code),
|
| 897 |
+
"inputs": [],
|
| 898 |
+
"outputs": [],
|
| 899 |
+
"test": row.get("test_case", ""),
|
| 900 |
+
"tasks": [],
|
| 901 |
+
"source": "EffiBench",
|
| 902 |
+
"has_ground_truth": False,
|
| 903 |
+
"has_tasks": False,
|
| 904 |
+
"description": row.get("markdown_description", row.get("description", "")),
|
| 905 |
+
}
|
|
@@ -0,0 +1,366 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Code reasoning / completion benchmark adapters (CRUXEval, SAFIM, HumanEval-X)."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
import re
|
| 6 |
+
from typing import Any
|
| 7 |
+
|
| 8 |
+
from adapters import DatasetAdapter
|
| 9 |
+
|
| 10 |
+
# Injected at runtime by _set_helpers()
|
| 11 |
+
_highlight_code = None
|
| 12 |
+
_code_offset = None
|
| 13 |
+
_extract_test_classes = None
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
# ---------------------------------------------------------------------------
|
| 17 |
+
# CRUXEval adapter (HuggingFace: cruxeval-org/cruxeval)
|
| 18 |
+
# ---------------------------------------------------------------------------
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
class CRUXEvalAdapter(DatasetAdapter):
|
| 22 |
+
slug = "cruxeval"
|
| 23 |
+
display_name = "CRUXEval"
|
| 24 |
+
has_ground_truth = False
|
| 25 |
+
has_tasks = True
|
| 26 |
+
|
| 27 |
+
def __init__(self, hf_dataset):
|
| 28 |
+
self._ds = hf_dataset
|
| 29 |
+
|
| 30 |
+
def problem_count(self) -> int:
|
| 31 |
+
return len(self._ds)
|
| 32 |
+
|
| 33 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 34 |
+
row = self._ds[idx]
|
| 35 |
+
return {
|
| 36 |
+
"idx": idx,
|
| 37 |
+
"task_id": row["id"],
|
| 38 |
+
"entry_point": "f",
|
| 39 |
+
"num_inputs": 1,
|
| 40 |
+
"source": "CRUXEval",
|
| 41 |
+
}
|
| 42 |
+
|
| 43 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 44 |
+
row = self._ds[idx]
|
| 45 |
+
code = row["code"]
|
| 46 |
+
return {
|
| 47 |
+
"idx": idx,
|
| 48 |
+
"task_id": row["id"],
|
| 49 |
+
"entry_point": "f",
|
| 50 |
+
"code": code,
|
| 51 |
+
"highlighted_code": _highlight_code(code),
|
| 52 |
+
"inputs": [row["input"]],
|
| 53 |
+
"outputs": [row["output"]],
|
| 54 |
+
"test": None,
|
| 55 |
+
"tasks": [
|
| 56 |
+
{
|
| 57 |
+
"name": "Output Prediction",
|
| 58 |
+
"description": "Given the code and input, predict the output.",
|
| 59 |
+
"given": "input",
|
| 60 |
+
"predict": "output",
|
| 61 |
+
"input": row["input"],
|
| 62 |
+
"output": row["output"],
|
| 63 |
+
},
|
| 64 |
+
{
|
| 65 |
+
"name": "Input Prediction",
|
| 66 |
+
"description": "Given the code and output, predict the input.",
|
| 67 |
+
"given": "output",
|
| 68 |
+
"predict": "input",
|
| 69 |
+
"input": row["input"],
|
| 70 |
+
"output": row["output"],
|
| 71 |
+
},
|
| 72 |
+
],
|
| 73 |
+
"source": "CRUXEval",
|
| 74 |
+
"has_ground_truth": False,
|
| 75 |
+
"has_tasks": True,
|
| 76 |
+
}
|
| 77 |
+
|
| 78 |
+
|
| 79 |
+
# ---------------------------------------------------------------------------
|
| 80 |
+
# SAFIM adapter (HuggingFace: gonglinyuan/safim)
|
| 81 |
+
# ---------------------------------------------------------------------------
|
| 82 |
+
|
| 83 |
+
|
| 84 |
+
class SAFIMAdapter(DatasetAdapter):
|
| 85 |
+
slug = "safim"
|
| 86 |
+
display_name = "SAFIM"
|
| 87 |
+
has_ground_truth = False
|
| 88 |
+
has_tasks = False
|
| 89 |
+
|
| 90 |
+
def __init__(self, hf_dataset):
|
| 91 |
+
self._ds = hf_dataset
|
| 92 |
+
|
| 93 |
+
def problem_count(self) -> int:
|
| 94 |
+
return len(self._ds)
|
| 95 |
+
|
| 96 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 97 |
+
row = self._ds[idx]
|
| 98 |
+
return {
|
| 99 |
+
"idx": idx,
|
| 100 |
+
"task_id": row.get("task_id", str(idx)),
|
| 101 |
+
"entry_point": row.get("task_id", f"safim_{idx}"),
|
| 102 |
+
"num_inputs": 0,
|
| 103 |
+
"source": row.get("lang", "unknown"),
|
| 104 |
+
}
|
| 105 |
+
|
| 106 |
+
# Patterns that mark where the completion should be inserted
|
| 107 |
+
_HOLE_MARKERS = [
|
| 108 |
+
"{{completion}}",
|
| 109 |
+
"/* TODO: Your code here */",
|
| 110 |
+
"// TODO: Your code here",
|
| 111 |
+
"# TODO: Your code here",
|
| 112 |
+
]
|
| 113 |
+
|
| 114 |
+
def _find_hole_marker(self, prompt: str) -> str | None:
|
| 115 |
+
"""Return the first matching hole marker found in the prompt, or None."""
|
| 116 |
+
for marker in self._HOLE_MARKERS:
|
| 117 |
+
if marker in prompt:
|
| 118 |
+
return marker
|
| 119 |
+
return None
|
| 120 |
+
|
| 121 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 122 |
+
row = self._ds[idx]
|
| 123 |
+
prompt = row.get("prompt", "")
|
| 124 |
+
ground_truth = row.get("ground_truth", "")
|
| 125 |
+
lang = row.get("lang", "python")
|
| 126 |
+
|
| 127 |
+
marker = self._find_hole_marker(prompt)
|
| 128 |
+
|
| 129 |
+
if marker:
|
| 130 |
+
display_code = prompt.replace(marker, "/* [HOLE] */")
|
| 131 |
+
before_hole = prompt.split(marker)[0]
|
| 132 |
+
merged_code = prompt.replace(marker, ground_truth)
|
| 133 |
+
else:
|
| 134 |
+
display_code = prompt + "\n/* [HOLE] */\n"
|
| 135 |
+
before_hole = prompt + "\n"
|
| 136 |
+
merged_code = prompt + "\n" + ground_truth + "\n"
|
| 137 |
+
|
| 138 |
+
# Compute 1-indexed line range of the inserted ground truth
|
| 139 |
+
gt_start_line = before_hole.count("\n") + 1
|
| 140 |
+
gt_line_count = ground_truth.count("\n") + (1 if ground_truth else 0)
|
| 141 |
+
gt_end_line = gt_start_line + gt_line_count - 1
|
| 142 |
+
|
| 143 |
+
lang_key = {"Python": "python", "Java": "java", "C++": "cpp", "C#": "csharp"}.get(
|
| 144 |
+
lang, lang.lower()
|
| 145 |
+
)
|
| 146 |
+
|
| 147 |
+
return {
|
| 148 |
+
"idx": idx,
|
| 149 |
+
"task_id": row.get("task_id", str(idx)),
|
| 150 |
+
"entry_point": row.get("task_id", f"safim_{idx}"),
|
| 151 |
+
"code": display_code,
|
| 152 |
+
"highlighted_code": _highlight_code(display_code, language=lang_key),
|
| 153 |
+
"inputs": [],
|
| 154 |
+
"outputs": [],
|
| 155 |
+
"test": None,
|
| 156 |
+
"tasks": [],
|
| 157 |
+
"source": lang,
|
| 158 |
+
"has_ground_truth": False,
|
| 159 |
+
"has_tasks": False,
|
| 160 |
+
"fim_prefix": prompt,
|
| 161 |
+
"fim_ground_truth": ground_truth,
|
| 162 |
+
"fim_ground_truth_highlighted": _highlight_code(ground_truth, language=lang_key),
|
| 163 |
+
"fim_merged_code": merged_code,
|
| 164 |
+
"fim_merged_highlighted": _highlight_code(
|
| 165 |
+
merged_code,
|
| 166 |
+
highlight_lines=list(range(gt_start_line, gt_end_line + 1)),
|
| 167 |
+
language=lang_key,
|
| 168 |
+
),
|
| 169 |
+
"fim_gt_start_line": gt_start_line,
|
| 170 |
+
"fim_gt_end_line": gt_end_line,
|
| 171 |
+
"language": lang,
|
| 172 |
+
}
|
| 173 |
+
|
| 174 |
+
|
| 175 |
+
# ---------------------------------------------------------------------------
|
| 176 |
+
# HumanEval-X adapter (HuggingFace: THUDM/humaneval-x)
|
| 177 |
+
# ---------------------------------------------------------------------------
|
| 178 |
+
|
| 179 |
+
|
| 180 |
+
def _extract_func_name(declaration: str) -> str:
|
| 181 |
+
"""Extract the function/method name from a code declaration string."""
|
| 182 |
+
m = re.search(r"def\s+(\w+)\s*\(", declaration)
|
| 183 |
+
if m:
|
| 184 |
+
return m.group(1)
|
| 185 |
+
m = re.search(r"(\w+)\s*\(", declaration)
|
| 186 |
+
if m:
|
| 187 |
+
return m.group(1)
|
| 188 |
+
return ""
|
| 189 |
+
|
| 190 |
+
|
| 191 |
+
# ---------------------------------------------------------------------------
|
| 192 |
+
# HumanEvalPack adapter (HuggingFace: bigcode/humanevalpack)
|
| 193 |
+
# ---------------------------------------------------------------------------
|
| 194 |
+
|
| 195 |
+
|
| 196 |
+
class HumanEvalPackAdapter(DatasetAdapter):
|
| 197 |
+
slug = "humanevalpack"
|
| 198 |
+
display_name = "HumanEvalPack"
|
| 199 |
+
has_ground_truth = False
|
| 200 |
+
has_tasks = False
|
| 201 |
+
|
| 202 |
+
LANGUAGES = ["python", "js", "cpp", "go", "java", "rust"]
|
| 203 |
+
|
| 204 |
+
def __init__(self, datasets_by_lang: dict[str, Any]):
|
| 205 |
+
self._by_lang = datasets_by_lang
|
| 206 |
+
first_lang = next(iter(self._by_lang))
|
| 207 |
+
self._count = len(self._by_lang[first_lang])
|
| 208 |
+
|
| 209 |
+
def problem_count(self) -> int:
|
| 210 |
+
return self._count
|
| 211 |
+
|
| 212 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 213 |
+
first_lang = next(iter(self._by_lang))
|
| 214 |
+
row = self._by_lang[first_lang][idx]
|
| 215 |
+
return {
|
| 216 |
+
"idx": idx,
|
| 217 |
+
"task_id": row["task_id"],
|
| 218 |
+
"entry_point": row.get("entry_point", f"problem_{idx}"),
|
| 219 |
+
"num_inputs": len(self._by_lang),
|
| 220 |
+
"source": "HumanEvalPack",
|
| 221 |
+
}
|
| 222 |
+
|
| 223 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 224 |
+
first_lang = next(iter(self._by_lang))
|
| 225 |
+
row = self._by_lang[first_lang][idx]
|
| 226 |
+
|
| 227 |
+
lang_labels = {
|
| 228 |
+
"python": "Python",
|
| 229 |
+
"js": "JavaScript",
|
| 230 |
+
"cpp": "C++",
|
| 231 |
+
"go": "Go",
|
| 232 |
+
"java": "Java",
|
| 233 |
+
"rust": "Rust",
|
| 234 |
+
}
|
| 235 |
+
lang_pygments = {
|
| 236 |
+
"python": "python",
|
| 237 |
+
"js": "javascript",
|
| 238 |
+
"cpp": "cpp",
|
| 239 |
+
"go": "go",
|
| 240 |
+
"java": "java",
|
| 241 |
+
"rust": "rust",
|
| 242 |
+
}
|
| 243 |
+
|
| 244 |
+
lang_solutions = []
|
| 245 |
+
for lang in self.LANGUAGES:
|
| 246 |
+
if lang not in self._by_lang:
|
| 247 |
+
continue
|
| 248 |
+
lrow = self._by_lang[lang][idx]
|
| 249 |
+
canonical = lrow.get("prompt", "") + lrow.get("canonical_solution", "")
|
| 250 |
+
buggy = lrow.get("prompt", "") + lrow.get("buggy_solution", "")
|
| 251 |
+
lang_key = lang_pygments.get(lang, lang)
|
| 252 |
+
lang_solutions.append(
|
| 253 |
+
{
|
| 254 |
+
"language": lang,
|
| 255 |
+
"language_label": lang_labels.get(lang, lang),
|
| 256 |
+
"code": canonical,
|
| 257 |
+
"highlighted_code": _highlight_code(canonical, language=lang_key),
|
| 258 |
+
"buggy_code": buggy,
|
| 259 |
+
"buggy_highlighted_code": _highlight_code(buggy, language=lang_key),
|
| 260 |
+
"test": lrow.get("test", ""),
|
| 261 |
+
"example_test": lrow.get("example_test", ""),
|
| 262 |
+
"bug_type": lrow.get("bug_type", ""),
|
| 263 |
+
"failure_symptoms": lrow.get("failure_symptoms", ""),
|
| 264 |
+
}
|
| 265 |
+
)
|
| 266 |
+
|
| 267 |
+
py_row = self._by_lang.get("python", self._by_lang[first_lang])[idx]
|
| 268 |
+
default_code = py_row.get("prompt", "") + py_row.get("canonical_solution", "")
|
| 269 |
+
|
| 270 |
+
return {
|
| 271 |
+
"idx": idx,
|
| 272 |
+
"task_id": row["task_id"],
|
| 273 |
+
"entry_point": row.get("entry_point", f"problem_{idx}"),
|
| 274 |
+
"code": default_code,
|
| 275 |
+
"highlighted_code": _highlight_code(default_code),
|
| 276 |
+
"inputs": [],
|
| 277 |
+
"outputs": [],
|
| 278 |
+
"test": py_row.get("test", ""),
|
| 279 |
+
"tasks": [],
|
| 280 |
+
"source": "HumanEvalPack",
|
| 281 |
+
"has_ground_truth": False,
|
| 282 |
+
"has_tasks": False,
|
| 283 |
+
"description": row.get("instruction", row.get("docstring", "")),
|
| 284 |
+
"lang_solutions": lang_solutions,
|
| 285 |
+
"bug_type": py_row.get("bug_type", ""),
|
| 286 |
+
"failure_symptoms": py_row.get("failure_symptoms", ""),
|
| 287 |
+
}
|
| 288 |
+
|
| 289 |
+
|
| 290 |
+
# ---------------------------------------------------------------------------
|
| 291 |
+
# HumanEval-X adapter (HuggingFace: THUDM/humaneval-x)
|
| 292 |
+
# ---------------------------------------------------------------------------
|
| 293 |
+
|
| 294 |
+
|
| 295 |
+
class HumanEvalXAdapter(DatasetAdapter):
|
| 296 |
+
slug = "humanevalx"
|
| 297 |
+
display_name = "HumanEval-X"
|
| 298 |
+
has_ground_truth = False
|
| 299 |
+
has_tasks = False
|
| 300 |
+
|
| 301 |
+
LANGUAGES = ["python", "cpp", "java", "go", "js"]
|
| 302 |
+
|
| 303 |
+
def __init__(self, datasets_by_lang: dict[str, Any]):
|
| 304 |
+
"""datasets_by_lang maps language name -> HF dataset split."""
|
| 305 |
+
self._by_lang = datasets_by_lang
|
| 306 |
+
first_lang = next(iter(self._by_lang))
|
| 307 |
+
self._count = len(self._by_lang[first_lang])
|
| 308 |
+
|
| 309 |
+
def problem_count(self) -> int:
|
| 310 |
+
return self._count
|
| 311 |
+
|
| 312 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 313 |
+
first_lang = next(iter(self._by_lang))
|
| 314 |
+
row = self._by_lang[first_lang][idx]
|
| 315 |
+
task_id = row["task_id"].split("/")[-1]
|
| 316 |
+
decl = row.get("declaration", row.get("prompt", ""))
|
| 317 |
+
entry = _extract_func_name(decl) or f"problem_{task_id}"
|
| 318 |
+
return {
|
| 319 |
+
"idx": idx,
|
| 320 |
+
"task_id": f"HumanEval/{task_id}",
|
| 321 |
+
"entry_point": entry,
|
| 322 |
+
"num_inputs": len(self._by_lang),
|
| 323 |
+
"source": "HumanEval-X",
|
| 324 |
+
}
|
| 325 |
+
|
| 326 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 327 |
+
first_lang = next(iter(self._by_lang))
|
| 328 |
+
row = self._by_lang[first_lang][idx]
|
| 329 |
+
task_id = row["task_id"].split("/")[-1]
|
| 330 |
+
decl = row.get("declaration", row.get("prompt", ""))
|
| 331 |
+
entry = _extract_func_name(decl) or f"problem_{task_id}"
|
| 332 |
+
|
| 333 |
+
lang_solutions = []
|
| 334 |
+
for lang in self.LANGUAGES:
|
| 335 |
+
if lang not in self._by_lang:
|
| 336 |
+
continue
|
| 337 |
+
lrow = self._by_lang[lang][idx]
|
| 338 |
+
code = lrow["prompt"] + lrow["canonical_solution"]
|
| 339 |
+
lang_solutions.append(
|
| 340 |
+
{
|
| 341 |
+
"language": lang,
|
| 342 |
+
"code": code,
|
| 343 |
+
"highlighted_code": _highlight_code(code, language=lang),
|
| 344 |
+
"test": lrow.get("test", ""),
|
| 345 |
+
"example_test": lrow.get("example_test", ""),
|
| 346 |
+
}
|
| 347 |
+
)
|
| 348 |
+
|
| 349 |
+
py_row = self._by_lang.get("python", self._by_lang[first_lang])[idx]
|
| 350 |
+
default_code = py_row["prompt"] + py_row["canonical_solution"]
|
| 351 |
+
|
| 352 |
+
return {
|
| 353 |
+
"idx": idx,
|
| 354 |
+
"task_id": f"HumanEval/{task_id}",
|
| 355 |
+
"entry_point": entry,
|
| 356 |
+
"code": default_code,
|
| 357 |
+
"highlighted_code": _highlight_code(default_code),
|
| 358 |
+
"inputs": [],
|
| 359 |
+
"outputs": [],
|
| 360 |
+
"test": py_row.get("test", ""),
|
| 361 |
+
"tasks": [],
|
| 362 |
+
"source": "HumanEval-X",
|
| 363 |
+
"has_ground_truth": False,
|
| 364 |
+
"has_tasks": False,
|
| 365 |
+
"lang_solutions": lang_solutions,
|
| 366 |
+
}
|
|
@@ -0,0 +1,410 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Dataset registration — loads all HuggingFace datasets into the adapter registry."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
import json
|
| 6 |
+
import random
|
| 7 |
+
from typing import Any
|
| 8 |
+
|
| 9 |
+
from adapters import REGISTRY
|
| 10 |
+
from adapters.code_editing import (
|
| 11 |
+
CanItEditAdapter,
|
| 12 |
+
CodeEditorBenchAdapter,
|
| 13 |
+
CodeXGLUERefinementAdapter,
|
| 14 |
+
CommitBenchAdapter,
|
| 15 |
+
DebugBenchAdapter,
|
| 16 |
+
SWEBenchFullAdapter,
|
| 17 |
+
SWEBenchLiteAdapter,
|
| 18 |
+
SWEBenchVerifiedAdapter,
|
| 19 |
+
)
|
| 20 |
+
from adapters.code_generation import (
|
| 21 |
+
APPSAdapter,
|
| 22 |
+
BigCodeBenchAdapter,
|
| 23 |
+
BigOBenchAdapter,
|
| 24 |
+
ClassEvalAdapter,
|
| 25 |
+
CodeContestsAdapter,
|
| 26 |
+
CodeSearchNetAdapter,
|
| 27 |
+
EffiBenchAdapter,
|
| 28 |
+
HumanEvalPlusAdapter,
|
| 29 |
+
LiveCodeBenchAdapter,
|
| 30 |
+
MBPPAdapter,
|
| 31 |
+
MBPPPlusAdapter,
|
| 32 |
+
REvalAdapter,
|
| 33 |
+
merge_bigobench,
|
| 34 |
+
)
|
| 35 |
+
from adapters.code_reasoning import (
|
| 36 |
+
CRUXEvalAdapter,
|
| 37 |
+
HumanEvalPackAdapter,
|
| 38 |
+
HumanEvalXAdapter,
|
| 39 |
+
SAFIMAdapter,
|
| 40 |
+
)
|
| 41 |
+
from adapters.vulnerability import (
|
| 42 |
+
BigVulAdapter,
|
| 43 |
+
DevignAdapter,
|
| 44 |
+
DiverseVulAdapter,
|
| 45 |
+
PrimeVulAdapter,
|
| 46 |
+
)
|
| 47 |
+
|
| 48 |
+
# ---------------------------------------------------------------------------
|
| 49 |
+
# Sampling: cap large datasets at MAX_DISPLAY_SAMPLES for fast browsing
|
| 50 |
+
# ---------------------------------------------------------------------------
|
| 51 |
+
|
| 52 |
+
MAX_DISPLAY_SAMPLES = 1000
|
| 53 |
+
_SAMPLE_SEED = 42
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
def _sample_indices(total: int) -> list[int]:
|
| 57 |
+
"""Return a sorted list of up to MAX_DISPLAY_SAMPLES random indices."""
|
| 58 |
+
if total <= MAX_DISPLAY_SAMPLES:
|
| 59 |
+
return list(range(total))
|
| 60 |
+
rng = random.Random(_SAMPLE_SEED)
|
| 61 |
+
return sorted(rng.sample(range(total), MAX_DISPLAY_SAMPLES))
|
| 62 |
+
|
| 63 |
+
|
| 64 |
+
def _sample_hf_dataset(ds):
|
| 65 |
+
"""Return a HuggingFace dataset (or subset) with at most MAX_DISPLAY_SAMPLES rows."""
|
| 66 |
+
if len(ds) <= MAX_DISPLAY_SAMPLES:
|
| 67 |
+
return ds
|
| 68 |
+
indices = _sample_indices(len(ds))
|
| 69 |
+
return ds.select(indices)
|
| 70 |
+
|
| 71 |
+
|
| 72 |
+
def _sample_list(rows: list) -> list:
|
| 73 |
+
"""Return a list with at most MAX_DISPLAY_SAMPLES items."""
|
| 74 |
+
if len(rows) <= MAX_DISPLAY_SAMPLES:
|
| 75 |
+
return rows
|
| 76 |
+
indices = _sample_indices(len(rows))
|
| 77 |
+
return [rows[i] for i in indices]
|
| 78 |
+
|
| 79 |
+
|
| 80 |
+
def _load_jsonl_dataset(repo_id: str, filenames: list[str]) -> list[dict[str, Any]]:
|
| 81 |
+
"""Download JSONL files from a HuggingFace dataset repo and return as a list of dicts.
|
| 82 |
+
|
| 83 |
+
This bypasses the ``datasets`` library when the repo uses deprecated loading scripts.
|
| 84 |
+
"""
|
| 85 |
+
from huggingface_hub import hf_hub_download
|
| 86 |
+
|
| 87 |
+
rows: list[dict[str, Any]] = []
|
| 88 |
+
for fname in filenames:
|
| 89 |
+
path = hf_hub_download(repo_id, fname, repo_type="dataset")
|
| 90 |
+
with open(path) as f:
|
| 91 |
+
for line in f:
|
| 92 |
+
line = line.strip()
|
| 93 |
+
if line:
|
| 94 |
+
rows.append(json.loads(line))
|
| 95 |
+
return rows
|
| 96 |
+
|
| 97 |
+
|
| 98 |
+
def register_hf_datasets() -> None:
|
| 99 |
+
"""Load all HuggingFace datasets into :data:`REGISTRY`."""
|
| 100 |
+
from datasets import load_dataset
|
| 101 |
+
|
| 102 |
+
# --- Base datasets ---
|
| 103 |
+
|
| 104 |
+
try:
|
| 105 |
+
problems = load_dataset("JetBrains-Research/REval", "problems", split="test")
|
| 106 |
+
tasks = load_dataset("JetBrains-Research/REval", "tasks", split="test")
|
| 107 |
+
executions = load_dataset("JetBrains-Research/REval", "executions", split="test")
|
| 108 |
+
states = load_dataset("JetBrains-Research/REval", "states", split="test")
|
| 109 |
+
REGISTRY["reval"] = REvalAdapter(problems, tasks, executions, states)
|
| 110 |
+
print(f"Loaded REval: {len(problems)} problems")
|
| 111 |
+
except Exception as e:
|
| 112 |
+
print(f"Warning: could not load REval: {e}")
|
| 113 |
+
|
| 114 |
+
try:
|
| 115 |
+
crux = load_dataset("cruxeval-org/cruxeval", split="test")
|
| 116 |
+
REGISTRY["cruxeval"] = CRUXEvalAdapter(crux)
|
| 117 |
+
print(f"Loaded CRUXEval: {len(crux)} problems")
|
| 118 |
+
except Exception as e:
|
| 119 |
+
print(f"Warning: could not load CRUXEval: {e}")
|
| 120 |
+
|
| 121 |
+
try:
|
| 122 |
+
heplus = load_dataset("evalplus/humanevalplus", split="test")
|
| 123 |
+
REGISTRY["humanevalplus"] = HumanEvalPlusAdapter(heplus)
|
| 124 |
+
print(f"Loaded HumanEval+: {len(heplus)} problems")
|
| 125 |
+
except Exception as e:
|
| 126 |
+
print(f"Warning: could not load HumanEval+: {e}")
|
| 127 |
+
|
| 128 |
+
try:
|
| 129 |
+
ds_time = load_dataset(
|
| 130 |
+
"facebook/BigOBench", "time_complexity_test_set.jsonl", split="train"
|
| 131 |
+
)
|
| 132 |
+
ds_space = load_dataset(
|
| 133 |
+
"facebook/BigOBench", "space_complexity_test_set.jsonl", split="train"
|
| 134 |
+
)
|
| 135 |
+
merged = merge_bigobench(ds_time, ds_space)
|
| 136 |
+
REGISTRY["bigobench"] = BigOBenchAdapter(merged)
|
| 137 |
+
print(
|
| 138 |
+
f"Loaded BigOBench: {len(merged)} problems "
|
| 139 |
+
f"({len(ds_time)} time + {len(ds_space)} space)"
|
| 140 |
+
)
|
| 141 |
+
except Exception as e:
|
| 142 |
+
print(f"Warning: could not load BigOBench: {e}")
|
| 143 |
+
|
| 144 |
+
# --- Batch 1 datasets ---
|
| 145 |
+
|
| 146 |
+
try:
|
| 147 |
+
mbppplus = load_dataset("evalplus/mbppplus", split="test")
|
| 148 |
+
REGISTRY["mbppplus"] = MBPPPlusAdapter(mbppplus)
|
| 149 |
+
print(f"Loaded MBPP+: {len(mbppplus)} problems")
|
| 150 |
+
except Exception as e:
|
| 151 |
+
print(f"Warning: could not load MBPP+: {e}")
|
| 152 |
+
|
| 153 |
+
try:
|
| 154 |
+
classeval = load_dataset("FudanSELab/ClassEval", split="test")
|
| 155 |
+
REGISTRY["classeval"] = ClassEvalAdapter(classeval)
|
| 156 |
+
print(f"Loaded ClassEval: {len(classeval)} problems")
|
| 157 |
+
except Exception as e:
|
| 158 |
+
print(f"Warning: could not load ClassEval: {e}")
|
| 159 |
+
|
| 160 |
+
try:
|
| 161 |
+
lcb = _load_jsonl_dataset(
|
| 162 |
+
"livecodebench/code_generation_lite",
|
| 163 |
+
[
|
| 164 |
+
"test.jsonl",
|
| 165 |
+
"test2.jsonl",
|
| 166 |
+
"test3.jsonl",
|
| 167 |
+
"test4.jsonl",
|
| 168 |
+
"test5.jsonl",
|
| 169 |
+
"test6.jsonl",
|
| 170 |
+
],
|
| 171 |
+
)
|
| 172 |
+
lcb_sampled = _sample_list(lcb)
|
| 173 |
+
adapter = LiveCodeBenchAdapter(lcb_sampled)
|
| 174 |
+
adapter.total_count = len(lcb)
|
| 175 |
+
REGISTRY["livecodebench"] = adapter
|
| 176 |
+
print(f"Loaded LiveCodeBench: {len(lcb_sampled)} problems (of {len(lcb)})")
|
| 177 |
+
except Exception as e:
|
| 178 |
+
print(f"Warning: could not load LiveCodeBench: {e}")
|
| 179 |
+
|
| 180 |
+
try:
|
| 181 |
+
debugbench_full = load_dataset("Rtian/DebugBench", split="test")
|
| 182 |
+
debugbench = _sample_hf_dataset(debugbench_full)
|
| 183 |
+
adapter = DebugBenchAdapter(debugbench)
|
| 184 |
+
adapter.total_count = len(debugbench_full)
|
| 185 |
+
REGISTRY["debugbench"] = adapter
|
| 186 |
+
print(f"Loaded DebugBench: {len(debugbench)} problems (of {len(debugbench_full)})")
|
| 187 |
+
except Exception as e:
|
| 188 |
+
print(f"Warning: could not load DebugBench: {e}")
|
| 189 |
+
|
| 190 |
+
try:
|
| 191 |
+
hx_datasets = {}
|
| 192 |
+
for lang in HumanEvalXAdapter.LANGUAGES:
|
| 193 |
+
hx_datasets[lang] = _load_jsonl_dataset(
|
| 194 |
+
"THUDM/humaneval-x",
|
| 195 |
+
[f"data/{lang}/data/humaneval.jsonl"],
|
| 196 |
+
)
|
| 197 |
+
REGISTRY["humanevalx"] = HumanEvalXAdapter(hx_datasets)
|
| 198 |
+
print(
|
| 199 |
+
f"Loaded HumanEval-X: {len(hx_datasets)} languages, "
|
| 200 |
+
f"{len(hx_datasets[next(iter(hx_datasets))])} problems each"
|
| 201 |
+
)
|
| 202 |
+
except Exception as e:
|
| 203 |
+
print(f"Warning: could not load HumanEval-X: {e}")
|
| 204 |
+
|
| 205 |
+
# --- Batch 2 datasets ---
|
| 206 |
+
|
| 207 |
+
try:
|
| 208 |
+
swe = load_dataset("princeton-nlp/SWE-bench_Lite", split="test")
|
| 209 |
+
REGISTRY["swebenchlite"] = SWEBenchLiteAdapter(swe)
|
| 210 |
+
print(f"Loaded SWE-bench Lite: {len(swe)} problems")
|
| 211 |
+
except Exception as e:
|
| 212 |
+
print(f"Warning: could not load SWE-bench Lite: {e}")
|
| 213 |
+
|
| 214 |
+
try:
|
| 215 |
+
cc = load_dataset("deepmind/code_contests", split="test")
|
| 216 |
+
REGISTRY["codecontests"] = CodeContestsAdapter(cc)
|
| 217 |
+
print(f"Loaded CodeContests: {len(cc)} problems")
|
| 218 |
+
except Exception as e:
|
| 219 |
+
print(f"Warning: could not load CodeContests: {e}")
|
| 220 |
+
|
| 221 |
+
try:
|
| 222 |
+
apps_full = load_dataset(
|
| 223 |
+
"codeparrot/apps",
|
| 224 |
+
"default",
|
| 225 |
+
split="test",
|
| 226 |
+
revision="refs/convert/parquet",
|
| 227 |
+
)
|
| 228 |
+
apps = _sample_hf_dataset(apps_full)
|
| 229 |
+
adapter = APPSAdapter(apps)
|
| 230 |
+
adapter.total_count = len(apps_full)
|
| 231 |
+
REGISTRY["apps"] = adapter
|
| 232 |
+
print(f"Loaded APPS: {len(apps)} problems (of {len(apps_full)})")
|
| 233 |
+
except Exception as e:
|
| 234 |
+
print(f"Warning: could not load APPS: {e}")
|
| 235 |
+
|
| 236 |
+
try:
|
| 237 |
+
cie = load_dataset("nuprl/CanItEdit", split="test")
|
| 238 |
+
REGISTRY["canitedit"] = CanItEditAdapter(cie)
|
| 239 |
+
print(f"Loaded CanItEdit: {len(cie)} problems")
|
| 240 |
+
except Exception as e:
|
| 241 |
+
print(f"Warning: could not load CanItEdit: {e}")
|
| 242 |
+
|
| 243 |
+
try:
|
| 244 |
+
mbpp = load_dataset("google-research-datasets/mbpp", "full", split="test")
|
| 245 |
+
REGISTRY["mbpp"] = MBPPAdapter(mbpp)
|
| 246 |
+
print(f"Loaded MBPP: {len(mbpp)} problems")
|
| 247 |
+
except Exception as e:
|
| 248 |
+
print(f"Warning: could not load MBPP: {e}")
|
| 249 |
+
|
| 250 |
+
# --- Batch 3 datasets ---
|
| 251 |
+
|
| 252 |
+
try:
|
| 253 |
+
safim_full = load_dataset("gonglinyuan/safim", "block", split="test")
|
| 254 |
+
safim = _sample_hf_dataset(safim_full)
|
| 255 |
+
adapter = SAFIMAdapter(safim)
|
| 256 |
+
adapter.total_count = len(safim_full)
|
| 257 |
+
REGISTRY["safim"] = adapter
|
| 258 |
+
print(f"Loaded SAFIM: {len(safim)} problems (of {len(safim_full)})")
|
| 259 |
+
except Exception as e:
|
| 260 |
+
print(f"Warning: could not load SAFIM: {e}")
|
| 261 |
+
|
| 262 |
+
try:
|
| 263 |
+
bigvul_full = load_dataset("bstee615/bigvul", split="test")
|
| 264 |
+
bigvul = _sample_hf_dataset(bigvul_full)
|
| 265 |
+
adapter = BigVulAdapter(bigvul)
|
| 266 |
+
adapter.total_count = len(bigvul_full)
|
| 267 |
+
REGISTRY["bigvul"] = adapter
|
| 268 |
+
print(f"Loaded BigVul: {len(bigvul)} problems (of {len(bigvul_full)})")
|
| 269 |
+
except Exception as e:
|
| 270 |
+
print(f"Warning: could not load BigVul: {e}")
|
| 271 |
+
|
| 272 |
+
try:
|
| 273 |
+
diversevul_full = load_dataset("claudios/DiverseVul", split="test")
|
| 274 |
+
diversevul = _sample_hf_dataset(diversevul_full)
|
| 275 |
+
adapter = DiverseVulAdapter(diversevul)
|
| 276 |
+
adapter.total_count = len(diversevul_full)
|
| 277 |
+
REGISTRY["diversevul"] = adapter
|
| 278 |
+
print(f"Loaded DiverseVul: {len(diversevul)} problems (of {len(diversevul_full)})")
|
| 279 |
+
except Exception as e:
|
| 280 |
+
print(f"Warning: could not load DiverseVul: {e}")
|
| 281 |
+
|
| 282 |
+
try:
|
| 283 |
+
primevul_full = load_dataset(
|
| 284 |
+
"json",
|
| 285 |
+
data_files="hf://datasets/starsofchance/PrimeVul/primevul_test.jsonl",
|
| 286 |
+
split="train",
|
| 287 |
+
)
|
| 288 |
+
primevul = _sample_hf_dataset(primevul_full)
|
| 289 |
+
adapter = PrimeVulAdapter(primevul)
|
| 290 |
+
adapter.total_count = len(primevul_full)
|
| 291 |
+
REGISTRY["primevul"] = adapter
|
| 292 |
+
print(f"Loaded PrimeVul: {len(primevul)} problems (of {len(primevul_full)})")
|
| 293 |
+
except Exception as e:
|
| 294 |
+
print(f"Warning: could not load PrimeVul: {e}")
|
| 295 |
+
|
| 296 |
+
try:
|
| 297 |
+
ceb_rows: list[dict[str, Any]] = []
|
| 298 |
+
ceb_files = [
|
| 299 |
+
("code_debug", ["code_debug_primary.jsonl", "code_debug_plus.jsonl"]),
|
| 300 |
+
("code_translate", ["code_translate_primary.jsonl", "code_translate_plus.jsonl"]),
|
| 301 |
+
("code_polishment", ["code_polishment_primary.jsonl", "code_polishment_plus.jsonl"]),
|
| 302 |
+
("code_switch", ["code_switch_primary.jsonl", "code_switch_plus.jsonl"]),
|
| 303 |
+
]
|
| 304 |
+
for task_type, filenames in ceb_files:
|
| 305 |
+
try:
|
| 306 |
+
rows = _load_jsonl_dataset("m-a-p/CodeEditorBench", filenames)
|
| 307 |
+
for d in rows:
|
| 308 |
+
d["_task_type"] = task_type
|
| 309 |
+
if "difficulty" in d:
|
| 310 |
+
d["difficulty"] = str(d["difficulty"])
|
| 311 |
+
ceb_rows.extend(rows)
|
| 312 |
+
except Exception:
|
| 313 |
+
pass # skip task types that fail
|
| 314 |
+
if ceb_rows:
|
| 315 |
+
ceb_sampled = _sample_list(ceb_rows)
|
| 316 |
+
adapter = CodeEditorBenchAdapter(ceb_sampled)
|
| 317 |
+
adapter.total_count = len(ceb_rows)
|
| 318 |
+
REGISTRY["codeeditorbench"] = adapter
|
| 319 |
+
print(f"Loaded CodeEditorBench: {len(ceb_sampled)} problems (of {len(ceb_rows)})")
|
| 320 |
+
else:
|
| 321 |
+
print("Warning: could not load any CodeEditorBench task types")
|
| 322 |
+
except Exception as e:
|
| 323 |
+
print(f"Warning: could not load CodeEditorBench: {e}")
|
| 324 |
+
|
| 325 |
+
# --- Batch 4 datasets ---
|
| 326 |
+
|
| 327 |
+
try:
|
| 328 |
+
swe_v = load_dataset("princeton-nlp/SWE-bench_Verified", split="test")
|
| 329 |
+
REGISTRY["swebenchverified"] = SWEBenchVerifiedAdapter(swe_v)
|
| 330 |
+
print(f"Loaded SWE-bench Verified: {len(swe_v)} problems")
|
| 331 |
+
except Exception as e:
|
| 332 |
+
print(f"Warning: could not load SWE-bench Verified: {e}")
|
| 333 |
+
|
| 334 |
+
try:
|
| 335 |
+
csn_full = load_dataset("code-search-net/code_search_net", "python", split="test")
|
| 336 |
+
csn = _sample_hf_dataset(csn_full)
|
| 337 |
+
adapter = CodeSearchNetAdapter(csn)
|
| 338 |
+
adapter.total_count = len(csn_full)
|
| 339 |
+
REGISTRY["codesearchnet"] = adapter
|
| 340 |
+
print(f"Loaded CodeSearchNet: {len(csn)} problems (of {len(csn_full)})")
|
| 341 |
+
except Exception as e:
|
| 342 |
+
print(f"Warning: could not load CodeSearchNet: {e}")
|
| 343 |
+
|
| 344 |
+
try:
|
| 345 |
+
devign_full = load_dataset("google/code_x_glue_cc_defect_detection", split="test")
|
| 346 |
+
devign = _sample_hf_dataset(devign_full)
|
| 347 |
+
adapter = DevignAdapter(devign)
|
| 348 |
+
adapter.total_count = len(devign_full)
|
| 349 |
+
REGISTRY["devign"] = adapter
|
| 350 |
+
print(f"Loaded Devign: {len(devign)} problems (of {len(devign_full)})")
|
| 351 |
+
except Exception as e:
|
| 352 |
+
print(f"Warning: could not load Devign: {e}")
|
| 353 |
+
|
| 354 |
+
# --- Batch 5 datasets ---
|
| 355 |
+
|
| 356 |
+
try:
|
| 357 |
+
bcb = load_dataset("bigcode/bigcodebench", split="v0.1.4")
|
| 358 |
+
REGISTRY["bigcodebench"] = BigCodeBenchAdapter(bcb)
|
| 359 |
+
print(f"Loaded BigCodeBench: {len(bcb)} problems")
|
| 360 |
+
except Exception as e:
|
| 361 |
+
print(f"Warning: could not load BigCodeBench: {e}")
|
| 362 |
+
|
| 363 |
+
try:
|
| 364 |
+
hep_datasets = {}
|
| 365 |
+
for lang in HumanEvalPackAdapter.LANGUAGES:
|
| 366 |
+
hep_datasets[lang] = load_dataset("bigcode/humanevalpack", lang, split="test")
|
| 367 |
+
REGISTRY["humanevalpack"] = HumanEvalPackAdapter(hep_datasets)
|
| 368 |
+
print(
|
| 369 |
+
f"Loaded HumanEvalPack: {len(hep_datasets)} languages, "
|
| 370 |
+
f"{len(hep_datasets[next(iter(hep_datasets))])} problems each"
|
| 371 |
+
)
|
| 372 |
+
except Exception as e:
|
| 373 |
+
print(f"Warning: could not load HumanEvalPack: {e}")
|
| 374 |
+
|
| 375 |
+
try:
|
| 376 |
+
cxr_full = load_dataset("google/code_x_glue_cc_code_refinement", "medium", split="test")
|
| 377 |
+
cxr = _sample_hf_dataset(cxr_full)
|
| 378 |
+
adapter = CodeXGLUERefinementAdapter(cxr)
|
| 379 |
+
adapter.total_count = len(cxr_full)
|
| 380 |
+
REGISTRY["codexgluerefinement"] = adapter
|
| 381 |
+
print(f"Loaded CodeXGLUE Code Refinement: {len(cxr)} problems (of {len(cxr_full)})")
|
| 382 |
+
except Exception as e:
|
| 383 |
+
print(f"Warning: could not load CodeXGLUE Code Refinement: {e}")
|
| 384 |
+
|
| 385 |
+
try:
|
| 386 |
+
swe_full_ds = load_dataset("princeton-nlp/SWE-bench", split="test")
|
| 387 |
+
swe_full = _sample_hf_dataset(swe_full_ds)
|
| 388 |
+
adapter = SWEBenchFullAdapter(swe_full)
|
| 389 |
+
adapter.total_count = len(swe_full_ds)
|
| 390 |
+
REGISTRY["swebenchfull"] = adapter
|
| 391 |
+
print(f"Loaded SWE-bench: {len(swe_full)} problems (of {len(swe_full_ds)})")
|
| 392 |
+
except Exception as e:
|
| 393 |
+
print(f"Warning: could not load SWE-bench: {e}")
|
| 394 |
+
|
| 395 |
+
try:
|
| 396 |
+
cb_full = load_dataset("Maxscha/commitbench", split="test")
|
| 397 |
+
cb = _sample_hf_dataset(cb_full)
|
| 398 |
+
adapter = CommitBenchAdapter(cb)
|
| 399 |
+
adapter.total_count = len(cb_full)
|
| 400 |
+
REGISTRY["commitbench"] = adapter
|
| 401 |
+
print(f"Loaded CommitBench: {len(cb)} problems (of {len(cb_full)})")
|
| 402 |
+
except Exception as e:
|
| 403 |
+
print(f"Warning: could not load CommitBench: {e}")
|
| 404 |
+
|
| 405 |
+
try:
|
| 406 |
+
effibench = load_dataset("DONG19/EffiBench", split="train")
|
| 407 |
+
REGISTRY["effibench"] = EffiBenchAdapter(effibench)
|
| 408 |
+
print(f"Loaded EffiBench: {len(effibench)} problems")
|
| 409 |
+
except Exception as e:
|
| 410 |
+
print(f"Warning: could not load EffiBench: {e}")
|
|
@@ -0,0 +1,245 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Vulnerability detection benchmark adapters (BigVul, DiverseVul, PrimeVul, Devign)."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
from typing import Any
|
| 6 |
+
|
| 7 |
+
from adapters import DatasetAdapter
|
| 8 |
+
|
| 9 |
+
# Injected at runtime by _set_helpers()
|
| 10 |
+
_highlight_code = None
|
| 11 |
+
_code_offset = None
|
| 12 |
+
_extract_test_classes = None
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
# ---------------------------------------------------------------------------
|
| 16 |
+
# BigVul adapter (HuggingFace: bstee615/bigvul)
|
| 17 |
+
# ---------------------------------------------------------------------------
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
class BigVulAdapter(DatasetAdapter):
|
| 21 |
+
slug = "bigvul"
|
| 22 |
+
display_name = "BigVul"
|
| 23 |
+
has_ground_truth = False
|
| 24 |
+
has_tasks = False
|
| 25 |
+
|
| 26 |
+
def __init__(self, hf_dataset):
|
| 27 |
+
self._ds = hf_dataset
|
| 28 |
+
|
| 29 |
+
def problem_count(self) -> int:
|
| 30 |
+
return len(self._ds)
|
| 31 |
+
|
| 32 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 33 |
+
row = self._ds[idx]
|
| 34 |
+
return {
|
| 35 |
+
"idx": idx,
|
| 36 |
+
"task_id": row.get("CVE_ID", str(idx)),
|
| 37 |
+
"entry_point": row.get("CVE_ID", f"bigvul_{idx}"),
|
| 38 |
+
"num_inputs": 0,
|
| 39 |
+
"source": row.get("CWE_ID", "unknown"),
|
| 40 |
+
}
|
| 41 |
+
|
| 42 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 43 |
+
row = self._ds[idx]
|
| 44 |
+
vuln_code = row.get("func_before", "")
|
| 45 |
+
fixed_code = row.get("func_after", "")
|
| 46 |
+
lang = row.get("lang", "c")
|
| 47 |
+
lang_key = {"C": "c", "Java": "java", "PHP": "php"}.get(lang, "c")
|
| 48 |
+
return {
|
| 49 |
+
"idx": idx,
|
| 50 |
+
"task_id": row.get("CVE_ID", str(idx)),
|
| 51 |
+
"entry_point": row.get("CVE_ID", f"bigvul_{idx}"),
|
| 52 |
+
"code": fixed_code,
|
| 53 |
+
"highlighted_code": _highlight_code(fixed_code, language=lang_key),
|
| 54 |
+
"inputs": [],
|
| 55 |
+
"outputs": [],
|
| 56 |
+
"test": None,
|
| 57 |
+
"tasks": [],
|
| 58 |
+
"source": row.get("CWE_ID", "unknown"),
|
| 59 |
+
"has_ground_truth": False,
|
| 60 |
+
"has_tasks": False,
|
| 61 |
+
"description": row.get("commit_message", ""),
|
| 62 |
+
"vulnerable_code": vuln_code,
|
| 63 |
+
"vulnerable_highlighted_code": _highlight_code(vuln_code, language=lang_key),
|
| 64 |
+
"patched_code": fixed_code,
|
| 65 |
+
"patched_highlighted_code": _highlight_code(fixed_code, language=lang_key),
|
| 66 |
+
"cwe_id": row.get("CWE_ID", ""),
|
| 67 |
+
"cve_id": row.get("CVE_ID", ""),
|
| 68 |
+
"project": row.get("project", ""),
|
| 69 |
+
"language": lang,
|
| 70 |
+
"is_vulnerable": bool(row.get("vul", 0)),
|
| 71 |
+
}
|
| 72 |
+
|
| 73 |
+
|
| 74 |
+
# ---------------------------------------------------------------------------
|
| 75 |
+
# DiverseVul adapter (HuggingFace: claudios/DiverseVul)
|
| 76 |
+
# ---------------------------------------------------------------------------
|
| 77 |
+
|
| 78 |
+
|
| 79 |
+
class DiverseVulAdapter(DatasetAdapter):
|
| 80 |
+
slug = "diversevul"
|
| 81 |
+
display_name = "DiverseVul"
|
| 82 |
+
has_ground_truth = False
|
| 83 |
+
has_tasks = False
|
| 84 |
+
|
| 85 |
+
def __init__(self, hf_dataset):
|
| 86 |
+
self._ds = hf_dataset
|
| 87 |
+
|
| 88 |
+
def problem_count(self) -> int:
|
| 89 |
+
return len(self._ds)
|
| 90 |
+
|
| 91 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 92 |
+
row = self._ds[idx]
|
| 93 |
+
cwe_list = row.get("cwe", [])
|
| 94 |
+
cwe_label = cwe_list[0] if cwe_list else "unknown"
|
| 95 |
+
label = "Vulnerable" if row.get("target", 0) == 1 else "Patched"
|
| 96 |
+
return {
|
| 97 |
+
"idx": idx,
|
| 98 |
+
"task_id": row.get("commit_id", str(idx))[:12],
|
| 99 |
+
"entry_point": row.get("project", f"diversevul_{idx}"),
|
| 100 |
+
"num_inputs": 0,
|
| 101 |
+
"source": f"{label}/{cwe_label}",
|
| 102 |
+
}
|
| 103 |
+
|
| 104 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 105 |
+
row = self._ds[idx]
|
| 106 |
+
code = row.get("func", "")
|
| 107 |
+
cwe_list = list(row.get("cwe", []))
|
| 108 |
+
is_vuln = row.get("target", 0) == 1
|
| 109 |
+
return {
|
| 110 |
+
"idx": idx,
|
| 111 |
+
"task_id": row.get("commit_id", str(idx))[:12],
|
| 112 |
+
"entry_point": row.get("project", f"diversevul_{idx}"),
|
| 113 |
+
"code": code,
|
| 114 |
+
"highlighted_code": _highlight_code(code, language="c"),
|
| 115 |
+
"inputs": [],
|
| 116 |
+
"outputs": [],
|
| 117 |
+
"test": None,
|
| 118 |
+
"tasks": [],
|
| 119 |
+
"source": "Vulnerable" if is_vuln else "Patched",
|
| 120 |
+
"has_ground_truth": False,
|
| 121 |
+
"has_tasks": False,
|
| 122 |
+
"description": row.get("message", ""),
|
| 123 |
+
"vulnerable_code": code if is_vuln else "",
|
| 124 |
+
"vulnerable_highlighted_code": _highlight_code(code, language="c") if is_vuln else "",
|
| 125 |
+
"patched_code": code if not is_vuln else "",
|
| 126 |
+
"patched_highlighted_code": (
|
| 127 |
+
_highlight_code(code, language="c") if not is_vuln else ""
|
| 128 |
+
),
|
| 129 |
+
"cwe_id": ", ".join(cwe_list) if cwe_list else "",
|
| 130 |
+
"project": row.get("project", ""),
|
| 131 |
+
"language": "C/C++",
|
| 132 |
+
"is_vulnerable": is_vuln,
|
| 133 |
+
}
|
| 134 |
+
|
| 135 |
+
|
| 136 |
+
# ---------------------------------------------------------------------------
|
| 137 |
+
# PrimeVul adapter (HuggingFace: starsofchance/PrimeVul)
|
| 138 |
+
# ---------------------------------------------------------------------------
|
| 139 |
+
|
| 140 |
+
|
| 141 |
+
class PrimeVulAdapter(DatasetAdapter):
|
| 142 |
+
slug = "primevul"
|
| 143 |
+
display_name = "PrimeVul"
|
| 144 |
+
has_ground_truth = False
|
| 145 |
+
has_tasks = False
|
| 146 |
+
|
| 147 |
+
def __init__(self, hf_dataset):
|
| 148 |
+
self._ds = hf_dataset
|
| 149 |
+
|
| 150 |
+
def problem_count(self) -> int:
|
| 151 |
+
return len(self._ds)
|
| 152 |
+
|
| 153 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 154 |
+
row = self._ds[idx]
|
| 155 |
+
label = "Vulnerable" if row.get("target", 0) == 1 else "Patched"
|
| 156 |
+
return {
|
| 157 |
+
"idx": idx,
|
| 158 |
+
"task_id": row.get("commit_id", str(idx))[:12],
|
| 159 |
+
"entry_point": row.get("project", f"primevul_{idx}"),
|
| 160 |
+
"num_inputs": 0,
|
| 161 |
+
"source": label,
|
| 162 |
+
}
|
| 163 |
+
|
| 164 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 165 |
+
row = self._ds[idx]
|
| 166 |
+
code = row.get("func", "")
|
| 167 |
+
is_vuln = row.get("target", 0) == 1
|
| 168 |
+
cwe_list = list(row.get("cwe", []))
|
| 169 |
+
return {
|
| 170 |
+
"idx": idx,
|
| 171 |
+
"task_id": row.get("commit_id", str(idx))[:12],
|
| 172 |
+
"entry_point": row.get("project", f"primevul_{idx}"),
|
| 173 |
+
"code": code,
|
| 174 |
+
"highlighted_code": _highlight_code(code, language="c"),
|
| 175 |
+
"inputs": [],
|
| 176 |
+
"outputs": [],
|
| 177 |
+
"test": None,
|
| 178 |
+
"tasks": [],
|
| 179 |
+
"source": "Vulnerable" if is_vuln else "Patched",
|
| 180 |
+
"has_ground_truth": False,
|
| 181 |
+
"has_tasks": False,
|
| 182 |
+
"description": row.get("commit_message", ""),
|
| 183 |
+
"vulnerable_code": code if is_vuln else "",
|
| 184 |
+
"vulnerable_highlighted_code": _highlight_code(code, language="c") if is_vuln else "",
|
| 185 |
+
"patched_code": code if not is_vuln else "",
|
| 186 |
+
"patched_highlighted_code": (
|
| 187 |
+
_highlight_code(code, language="c") if not is_vuln else ""
|
| 188 |
+
),
|
| 189 |
+
"cwe_id": ", ".join(cwe_list) if cwe_list else "",
|
| 190 |
+
"project": row.get("project", ""),
|
| 191 |
+
"language": "C/C++",
|
| 192 |
+
"is_vulnerable": is_vuln,
|
| 193 |
+
}
|
| 194 |
+
|
| 195 |
+
|
| 196 |
+
# ---------------------------------------------------------------------------
|
| 197 |
+
# Devign adapter (HuggingFace: google/code_x_glue_cc_defect_detection)
|
| 198 |
+
# ---------------------------------------------------------------------------
|
| 199 |
+
|
| 200 |
+
|
| 201 |
+
class DevignAdapter(DatasetAdapter):
|
| 202 |
+
slug = "devign"
|
| 203 |
+
display_name = "Devign"
|
| 204 |
+
has_ground_truth = False
|
| 205 |
+
has_tasks = False
|
| 206 |
+
|
| 207 |
+
def __init__(self, hf_dataset):
|
| 208 |
+
self._ds = hf_dataset
|
| 209 |
+
|
| 210 |
+
def problem_count(self) -> int:
|
| 211 |
+
return len(self._ds)
|
| 212 |
+
|
| 213 |
+
def get_problem_summary(self, idx: int) -> dict[str, Any]:
|
| 214 |
+
row = self._ds[idx]
|
| 215 |
+
label = "Vulnerable" if row.get("target", 0) == 1 else "Clean"
|
| 216 |
+
return {
|
| 217 |
+
"idx": idx,
|
| 218 |
+
"task_id": str(row.get("commit_id", idx))[:12],
|
| 219 |
+
"entry_point": row.get("project", f"devign_{idx}"),
|
| 220 |
+
"num_inputs": 0,
|
| 221 |
+
"source": label,
|
| 222 |
+
}
|
| 223 |
+
|
| 224 |
+
def get_problem_detail(self, idx: int) -> dict[str, Any]:
|
| 225 |
+
row = self._ds[idx]
|
| 226 |
+
code = row.get("func", "")
|
| 227 |
+
is_vuln = row.get("target", 0) == 1
|
| 228 |
+
return {
|
| 229 |
+
"idx": idx,
|
| 230 |
+
"task_id": str(row.get("commit_id", idx))[:12],
|
| 231 |
+
"entry_point": row.get("project", f"devign_{idx}"),
|
| 232 |
+
"code": code,
|
| 233 |
+
"highlighted_code": _highlight_code(code, language="c"),
|
| 234 |
+
"inputs": [],
|
| 235 |
+
"outputs": [],
|
| 236 |
+
"test": None,
|
| 237 |
+
"tasks": [],
|
| 238 |
+
"source": "Vulnerable" if is_vuln else "Clean",
|
| 239 |
+
"has_ground_truth": False,
|
| 240 |
+
"has_tasks": False,
|
| 241 |
+
"description": row.get("commit_message", ""),
|
| 242 |
+
"is_vulnerable": is_vuln,
|
| 243 |
+
"project": row.get("project", ""),
|
| 244 |
+
"language": "C",
|
| 245 |
+
}
|
|
@@ -12,7 +12,7 @@ import os
|
|
| 12 |
from flask import Flask, jsonify, render_template, request
|
| 13 |
from pygments import highlight
|
| 14 |
from pygments.formatters import HtmlFormatter
|
| 15 |
-
from pygments.lexers import PythonLexer
|
| 16 |
|
| 17 |
app = Flask(__name__)
|
| 18 |
|
|
@@ -34,14 +34,16 @@ def _extract_test_classes(test_code: str, cls_name: str) -> list:
|
|
| 34 |
lines = test_code.splitlines(keepends=True)
|
| 35 |
prefix = f"{cls_name}Test"
|
| 36 |
result = []
|
| 37 |
-
for node in tree.body:
|
| 38 |
if isinstance(node, _ast.ClassDef) and node.name.startswith(prefix):
|
| 39 |
-
start = node.lineno - 1
|
| 40 |
-
end = node.end_lineno
|
| 41 |
-
result.append(
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
|
|
|
|
|
|
| 45 |
return result
|
| 46 |
|
| 47 |
|
|
@@ -49,20 +51,22 @@ def _code_offset(code: str) -> int:
|
|
| 49 |
"""Number of leading newlines that Pygments will strip."""
|
| 50 |
offset = 0
|
| 51 |
for ch in code:
|
| 52 |
-
if ch ==
|
| 53 |
offset += 1
|
| 54 |
else:
|
| 55 |
break
|
| 56 |
return offset
|
| 57 |
|
| 58 |
|
| 59 |
-
def highlight_code(code, highlight_lines=None):
|
| 60 |
"""
|
| 61 |
-
Syntax highlight
|
| 62 |
|
| 63 |
Args:
|
| 64 |
-
code: The
|
| 65 |
highlight_lines: List of line numbers (1-indexed) to highlight
|
|
|
|
|
|
|
| 66 |
|
| 67 |
Returns:
|
| 68 |
HTML string with syntax highlighted code
|
|
@@ -70,7 +74,11 @@ def highlight_code(code, highlight_lines=None):
|
|
| 70 |
formatter = HtmlFormatter(
|
| 71 |
linenos="table", cssclass="source", hl_lines=highlight_lines or [], linenostart=1
|
| 72 |
)
|
| 73 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
|
| 75 |
|
| 76 |
def get_css():
|
|
@@ -82,7 +90,7 @@ def get_css():
|
|
| 82 |
# Dataset adapter registration
|
| 83 |
# ---------------------------------------------------------------------------
|
| 84 |
|
| 85 |
-
from
|
| 86 |
|
| 87 |
# Inject helper functions into the adapters module (avoids circular imports)
|
| 88 |
_set_helpers(highlight_code, _code_offset, _extract_test_classes)
|
|
@@ -100,6 +108,7 @@ def _get_adapter(dataset_slug: str):
|
|
| 100 |
# Routes
|
| 101 |
# ---------------------------------------------------------------------------
|
| 102 |
|
|
|
|
| 103 |
@app.route("/")
|
| 104 |
def index():
|
| 105 |
"""Main page showing list of all benchmark problems."""
|
|
@@ -109,15 +118,18 @@ def index():
|
|
| 109 |
@app.route("/api/datasets")
|
| 110 |
def get_datasets():
|
| 111 |
"""Return list of available datasets for the UI dataset selector."""
|
| 112 |
-
return jsonify(
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
|
|
|
|
|
|
|
|
|
| 121 |
|
| 122 |
|
| 123 |
@app.route("/api/<dataset_slug>/problems")
|
|
|
|
| 12 |
from flask import Flask, jsonify, render_template, request
|
| 13 |
from pygments import highlight
|
| 14 |
from pygments.formatters import HtmlFormatter
|
| 15 |
+
from pygments.lexers import PythonLexer, get_lexer_by_name
|
| 16 |
|
| 17 |
app = Flask(__name__)
|
| 18 |
|
|
|
|
| 34 |
lines = test_code.splitlines(keepends=True)
|
| 35 |
prefix = f"{cls_name}Test"
|
| 36 |
result = []
|
| 37 |
+
for node in tree.body: # top-level definitions, preserves source order
|
| 38 |
if isinstance(node, _ast.ClassDef) and node.name.startswith(prefix):
|
| 39 |
+
start = node.lineno - 1 # ast lineno is 1-indexed
|
| 40 |
+
end = node.end_lineno # end_lineno is inclusive; slice is exclusive
|
| 41 |
+
result.append(
|
| 42 |
+
{
|
| 43 |
+
"name": node.name,
|
| 44 |
+
"code": "".join(lines[start:end]),
|
| 45 |
+
}
|
| 46 |
+
)
|
| 47 |
return result
|
| 48 |
|
| 49 |
|
|
|
|
| 51 |
"""Number of leading newlines that Pygments will strip."""
|
| 52 |
offset = 0
|
| 53 |
for ch in code:
|
| 54 |
+
if ch == "\n":
|
| 55 |
offset += 1
|
| 56 |
else:
|
| 57 |
break
|
| 58 |
return offset
|
| 59 |
|
| 60 |
|
| 61 |
+
def highlight_code(code, highlight_lines=None, language="python"):
|
| 62 |
"""
|
| 63 |
+
Syntax highlight code with optional line highlighting.
|
| 64 |
|
| 65 |
Args:
|
| 66 |
+
code: The source code to highlight
|
| 67 |
highlight_lines: List of line numbers (1-indexed) to highlight
|
| 68 |
+
language: Programming language name (default: "python").
|
| 69 |
+
Must be a key in LEXER_MAP.
|
| 70 |
|
| 71 |
Returns:
|
| 72 |
HTML string with syntax highlighted code
|
|
|
|
| 74 |
formatter = HtmlFormatter(
|
| 75 |
linenos="table", cssclass="source", hl_lines=highlight_lines or [], linenostart=1
|
| 76 |
)
|
| 77 |
+
try:
|
| 78 |
+
lexer = get_lexer_by_name(language.lower())
|
| 79 |
+
except Exception:
|
| 80 |
+
lexer = PythonLexer()
|
| 81 |
+
return highlight(code, lexer, formatter)
|
| 82 |
|
| 83 |
|
| 84 |
def get_css():
|
|
|
|
| 90 |
# Dataset adapter registration
|
| 91 |
# ---------------------------------------------------------------------------
|
| 92 |
|
| 93 |
+
from adapters import REGISTRY, _set_helpers, register_hf_datasets # noqa: E402
|
| 94 |
|
| 95 |
# Inject helper functions into the adapters module (avoids circular imports)
|
| 96 |
_set_helpers(highlight_code, _code_offset, _extract_test_classes)
|
|
|
|
| 108 |
# Routes
|
| 109 |
# ---------------------------------------------------------------------------
|
| 110 |
|
| 111 |
+
|
| 112 |
@app.route("/")
|
| 113 |
def index():
|
| 114 |
"""Main page showing list of all benchmark problems."""
|
|
|
|
| 118 |
@app.route("/api/datasets")
|
| 119 |
def get_datasets():
|
| 120 |
"""Return list of available datasets for the UI dataset selector."""
|
| 121 |
+
return jsonify(
|
| 122 |
+
[
|
| 123 |
+
{
|
| 124 |
+
"slug": slug,
|
| 125 |
+
"display_name": adapter.display_name,
|
| 126 |
+
"problem_count": adapter.problem_count(),
|
| 127 |
+
"total_count": adapter.total_count,
|
| 128 |
+
"has_ground_truth": adapter.has_ground_truth,
|
| 129 |
+
}
|
| 130 |
+
for slug, adapter in REGISTRY.items()
|
| 131 |
+
]
|
| 132 |
+
)
|
| 133 |
|
| 134 |
|
| 135 |
@app.route("/api/<dataset_slug>/problems")
|
|
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
benchmark,category,year,size,languages,hf_dataset_id,data_access,visualization_complexity,influence,priority_score,batch,notes
|
| 2 |
+
MBPP+,Code Generation,2023,399,Python,evalplus/mbppplus,easy,simple,high,9,1,Natural companion to HumanEval+; same EvalPlus ecosystem
|
| 3 |
+
ClassEval,Code Generation,2023,100 classes (410 methods),Python,FudanSELab/ClassEval,easy,moderate,high,9,1,Class-level code generation with test classes
|
| 4 |
+
LiveCodeBench,Code Generation,2024,1055+,Python,livecodebench/code_generation_lite,easy,moderate,high,9,1,Continuously updated; contamination-free; high community interest
|
| 5 |
+
DebugBench,Code Editing/Debugging,2024,4253,"C++, Java, Python",Rtian/DebugBench,easy,moderate,high,8,1,Buggy code with implanted bugs; 4 categories; 18 minor types
|
| 6 |
+
HumanEval-X,Code Translation,2022,820 (164x5),"Python, C++, Java, JS, Go",THUDM/humaneval-x,easy,moderate,high,8,1,Same 164 problems in 5 languages with test cases
|
| 7 |
+
SWE-bench Lite,Code Editing,2024,300,Python,princeton-nlp/SWE-bench_Lite,easy,complex,very high,8,2,GitHub issue resolution; extremely high-profile
|
| 8 |
+
CodeContests,Code Generation,2022,13328,"C++, Python, Java",deepmind/code_contests,easy,moderate,high,8,2,AlphaCode benchmark; competitive programming
|
| 9 |
+
APPS,Code Generation,2021,10000,Python,codeparrot/apps,easy,moderate,high,7,2,Large-scale coding problems at 3 difficulty levels
|
| 10 |
+
CanItEdit,Code Editing,2023,105,Python,nuprl/CanItEdit,easy,simple,medium,7,2,Before/after code editing with dual instruction types
|
| 11 |
+
MBPP,Code Generation,2021,974,Python,google-research-datasets/mbpp,easy,simple,high,7,2,Original MBPP; foundational benchmark
|
| 12 |
+
DS-1000,Code Generation,2023,1000,Python,xlangai/DS-1000,easy,moderate,high,7,3,Data science library-specific problems (NumPy/Pandas/etc.)
|
| 13 |
+
CodeEditorBench,Code Editing,2024,7961,Multiple,m-a-p/CodeEditorBench,easy,moderate,medium,7,3,4 editing scenarios: debug/translate/polish/requirement switch
|
| 14 |
+
SAFIM,Code Completion,2024,17720,"Python, Java, C++, C#",gonglinyuan/safim,easy,moderate,medium,7,3,Syntax-aware fill-in-the-middle; 3 subtasks
|
| 15 |
+
BigVul,Vulnerability Detection,2020,190000,C/C++,bstee615/bigvul,easy,moderate,medium,6,3,CVE-linked vulnerability detection; 91 CWE types
|
| 16 |
+
RepoBench,Code Completion,2023,10000+,"Python, Java",tianyang/repobench-c,easy,complex,medium,6,3,Repo-level code completion with 3 sub-tasks
|
| 17 |
+
MultiPL-E,Code Generation/Translation,2023,HumanEval+MBPP in 22 langs,22 languages,nuprl/MultiPL-E,easy,moderate,medium,6,4,Translations of HumanEval/MBPP to 22 languages
|
| 18 |
+
DiverseVul,Vulnerability Detection,2023,350000+,C/C++,claudios/DiverseVul,easy,simple,medium,6,4,Large-scale vulnerability detection; 150 CWEs
|
| 19 |
+
PrimeVul,Vulnerability Detection,2024,236000+,C/C++,starsofchance/PrimeVul,easy,simple,medium,6,4,Highest quality labels for vuln detection
|
| 20 |
+
McEval,Code Generation,2024,16000,40 languages,Multilingual-Multimodal-NLP/McEval,easy,complex,medium,6,4,Massive language coverage
|
| 21 |
+
CodeSearchNet,Code Search/Summarization,2019,2000000,"Python, JS, Ruby, Go, Java, PHP",code-search-net/code_search_net,easy,moderate,medium,6,4,Foundational code search benchmark
|
| 22 |
+
xCodeEval,Multi-task,2023,25000000,11-17 languages,NTU-NLP-sg/xCodeEval,easy,very complex,medium,5,5,7 tasks; very large; complex format
|
| 23 |
+
Devign,Vulnerability Detection,2019,20756,C,google/code_x_glue_cc_defect_detection,easy,simple,medium,5,5,Function-level vulnerability identification
|
| 24 |
+
CrossVul,Vulnerability Detection,2021,9313,40+ languages,hitoshura25/crossvul,easy,simple,medium,5,5,Cross-language vulnerability detection
|
| 25 |
+
SWE-bench Verified,Code Editing,2024,500,Python,princeton-nlp/SWE-bench_Verified,easy,complex,high,5,5,Curated subset of SWE-bench
|
| 26 |
+
CoderEval,Code Generation,2023,460,"Python, Java",N/A (GitHub only),medium,complex,medium,4,deferred,Requires project-level context
|
| 27 |
+
NaturalCodeBench,Code Generation,2024,402,"Python, Java",N/A (GitHub only),medium,moderate,medium,4,deferred,Only dev set released (140 problems)
|
| 28 |
+
DevEval,Code Generation,2024,1874,Python,N/A (GitHub only),medium,complex,medium,4,deferred,Repository-level; complex dependencies
|
| 29 |
+
RunBugRun,Program Repair,2023,450000+,9 languages,N/A (GitHub/SQLite),hard,complex,medium,3,deferred,SQLite format; complex infrastructure
|
| 30 |
+
Defects4J,Program Repair,2014,854,Java,N/A (GitHub only),hard,very complex,high,3,deferred,Requires Java tooling; full project repos
|
| 31 |
+
ConDefects,Program Repair,2023,2879,"Java, Python",N/A (GitHub only),medium,moderate,medium,3,deferred,AtCoder buggy/fixed pairs
|
| 32 |
+
FixEval,Program Repair,2023,varies,"Python, Java",N/A (GitHub only),medium,moderate,low,3,deferred,Competitive programming fixes
|
| 33 |
+
TransCoder,Code Translation,2020,852,"Java, Python, C++",N/A (GitHub only),medium,moderate,medium,3,deferred,Facebook Research; unsupervised translation
|
| 34 |
+
AVATAR,Code Translation,2021,9515,"Java, Python",N/A (GitHub only),medium,moderate,low,3,deferred,Parallel Java-Python corpus
|
| 35 |
+
TypeEvalPy,Type Inference,2023,154,Python,N/A (GitHub only),medium,moderate,low,3,deferred,Niche; type inference evaluation
|
| 36 |
+
VJBench,Vulnerability Repair,2023,42,Java,N/A (GitHub only),hard,complex,low,2,deferred,Very small; requires Java tooling
|
| 37 |
+
SVEN,Vulnerability Detection,2023,1606,C/C++,N/A (GitHub only),medium,moderate,low,2,deferred,Small; security hardening focus
|
| 38 |
+
PyTER,Type Error Repair,2022,93,Python,N/A (Figshare),hard,complex,low,2,deferred,Very small; niche
|
|
@@ -0,0 +1,587 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
.problem-header {
|
| 2 |
+
display: flex;
|
| 3 |
+
justify-content: space-between;
|
| 4 |
+
align-items: center;
|
| 5 |
+
margin-bottom: 15px;
|
| 6 |
+
}
|
| 7 |
+
|
| 8 |
+
.problem-meta {
|
| 9 |
+
margin-bottom: 20px;
|
| 10 |
+
}
|
| 11 |
+
|
| 12 |
+
.meta-item {
|
| 13 |
+
display: inline-block;
|
| 14 |
+
margin-right: 15px;
|
| 15 |
+
margin-bottom: 10px;
|
| 16 |
+
}
|
| 17 |
+
|
| 18 |
+
.meta-label {
|
| 19 |
+
font-weight: 600;
|
| 20 |
+
color: #7f8c8d;
|
| 21 |
+
margin-right: 5px;
|
| 22 |
+
}
|
| 23 |
+
|
| 24 |
+
.meta-value {
|
| 25 |
+
color: #2c3e50;
|
| 26 |
+
}
|
| 27 |
+
|
| 28 |
+
.task-selector {
|
| 29 |
+
margin: 20px 0;
|
| 30 |
+
display: flex;
|
| 31 |
+
gap: 10px;
|
| 32 |
+
flex-wrap: wrap;
|
| 33 |
+
}
|
| 34 |
+
|
| 35 |
+
.task-btn {
|
| 36 |
+
padding: 10px 20px;
|
| 37 |
+
background: #ecf0f1;
|
| 38 |
+
border: 2px solid transparent;
|
| 39 |
+
border-radius: 4px;
|
| 40 |
+
cursor: pointer;
|
| 41 |
+
transition: all 0.3s;
|
| 42 |
+
font-size: 0.95rem;
|
| 43 |
+
}
|
| 44 |
+
|
| 45 |
+
.task-btn:hover {
|
| 46 |
+
background: #bdc3c7;
|
| 47 |
+
}
|
| 48 |
+
|
| 49 |
+
.task-btn.active {
|
| 50 |
+
background: #3498db;
|
| 51 |
+
color: white;
|
| 52 |
+
border-color: #2980b9;
|
| 53 |
+
}
|
| 54 |
+
|
| 55 |
+
.task-details {
|
| 56 |
+
margin-top: 20px;
|
| 57 |
+
}
|
| 58 |
+
|
| 59 |
+
.task-section {
|
| 60 |
+
margin-bottom: 25px;
|
| 61 |
+
padding: 15px;
|
| 62 |
+
background: #f8f9fa;
|
| 63 |
+
border-left: 4px solid #3498db;
|
| 64 |
+
border-radius: 4px;
|
| 65 |
+
}
|
| 66 |
+
|
| 67 |
+
.task-section h3 {
|
| 68 |
+
margin-bottom: 10px;
|
| 69 |
+
color: #2c3e50;
|
| 70 |
+
font-size: 1.1rem;
|
| 71 |
+
}
|
| 72 |
+
|
| 73 |
+
.code-block {
|
| 74 |
+
background: #f8f9fa;
|
| 75 |
+
padding: 15px;
|
| 76 |
+
border-radius: 4px;
|
| 77 |
+
overflow-x: auto;
|
| 78 |
+
font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', monospace;
|
| 79 |
+
font-size: 0.9rem;
|
| 80 |
+
border: 1px solid #e1e4e8;
|
| 81 |
+
}
|
| 82 |
+
|
| 83 |
+
.task-items-list {
|
| 84 |
+
list-style: none;
|
| 85 |
+
}
|
| 86 |
+
|
| 87 |
+
.task-items-list li {
|
| 88 |
+
padding: 10px;
|
| 89 |
+
margin-bottom: 8px;
|
| 90 |
+
background: white;
|
| 91 |
+
border-radius: 4px;
|
| 92 |
+
border: 1px solid #e1e4e8;
|
| 93 |
+
}
|
| 94 |
+
|
| 95 |
+
.line-ref {
|
| 96 |
+
display: inline-block;
|
| 97 |
+
padding: 2px 8px;
|
| 98 |
+
background: #3498db;
|
| 99 |
+
color: white;
|
| 100 |
+
border-radius: 3px;
|
| 101 |
+
font-family: monospace;
|
| 102 |
+
font-size: 0.85rem;
|
| 103 |
+
margin-right: 8px;
|
| 104 |
+
}
|
| 105 |
+
|
| 106 |
+
.var-name {
|
| 107 |
+
display: inline-block;
|
| 108 |
+
padding: 2px 8px;
|
| 109 |
+
background: #9b59b6;
|
| 110 |
+
color: white;
|
| 111 |
+
border-radius: 3px;
|
| 112 |
+
font-family: monospace;
|
| 113 |
+
font-size: 0.85rem;
|
| 114 |
+
}
|
| 115 |
+
|
| 116 |
+
.io-section {
|
| 117 |
+
display: grid;
|
| 118 |
+
grid-template-columns: 1fr 1fr;
|
| 119 |
+
gap: 15px;
|
| 120 |
+
}
|
| 121 |
+
|
| 122 |
+
@media (max-width: 768px) {
|
| 123 |
+
.io-section {
|
| 124 |
+
grid-template-columns: 1fr;
|
| 125 |
+
}
|
| 126 |
+
}
|
| 127 |
+
|
| 128 |
+
.navigation-hint {
|
| 129 |
+
margin-top: 20px;
|
| 130 |
+
padding: 15px;
|
| 131 |
+
background: #e8f4f8;
|
| 132 |
+
border-radius: 4px;
|
| 133 |
+
color: #2c3e50;
|
| 134 |
+
font-size: 0.9rem;
|
| 135 |
+
}
|
| 136 |
+
|
| 137 |
+
.test-code-section {
|
| 138 |
+
margin-top: 20px;
|
| 139 |
+
}
|
| 140 |
+
|
| 141 |
+
/* Inline task visualization */
|
| 142 |
+
.code-with-tasks {
|
| 143 |
+
position: relative;
|
| 144 |
+
}
|
| 145 |
+
|
| 146 |
+
.task-marker {
|
| 147 |
+
display: inline-block;
|
| 148 |
+
margin-left: 10px;
|
| 149 |
+
padding: 2px 8px;
|
| 150 |
+
background: #9b59b6;
|
| 151 |
+
color: white;
|
| 152 |
+
border-radius: 3px;
|
| 153 |
+
font-size: 0.75rem;
|
| 154 |
+
font-weight: 600;
|
| 155 |
+
cursor: crosshair;
|
| 156 |
+
}
|
| 157 |
+
|
| 158 |
+
/* Coverage coloring on lineno spans */
|
| 159 |
+
td.linenos .normal.line-executed {
|
| 160 |
+
background-color: #d4edda !important;
|
| 161 |
+
color: #155724 !important;
|
| 162 |
+
}
|
| 163 |
+
|
| 164 |
+
td.linenos .normal.line-not-executed {
|
| 165 |
+
background-color: #f8d7da !important;
|
| 166 |
+
color: #721c24 !important;
|
| 167 |
+
}
|
| 168 |
+
|
| 169 |
+
/* Coverage legend */
|
| 170 |
+
.coverage-legend {
|
| 171 |
+
margin: 10px 0;
|
| 172 |
+
padding: 10px 15px;
|
| 173 |
+
background: #f8f9fa;
|
| 174 |
+
border-left: 4px solid #28a745;
|
| 175 |
+
border-radius: 4px;
|
| 176 |
+
font-size: 0.85rem;
|
| 177 |
+
display: none;
|
| 178 |
+
}
|
| 179 |
+
|
| 180 |
+
.coverage-legend-item {
|
| 181 |
+
display: inline-block;
|
| 182 |
+
margin-right: 18px;
|
| 183 |
+
}
|
| 184 |
+
|
| 185 |
+
.coverage-swatch {
|
| 186 |
+
display: inline-block;
|
| 187 |
+
width: 12px;
|
| 188 |
+
height: 12px;
|
| 189 |
+
border-radius: 2px;
|
| 190 |
+
margin-right: 4px;
|
| 191 |
+
vertical-align: middle;
|
| 192 |
+
}
|
| 193 |
+
|
| 194 |
+
/* Ground truth answer badge */
|
| 195 |
+
.gt-answer {
|
| 196 |
+
display: inline-block;
|
| 197 |
+
margin-left: 10px;
|
| 198 |
+
padding: 2px 8px;
|
| 199 |
+
background: #17a2b8;
|
| 200 |
+
color: white;
|
| 201 |
+
border-radius: 3px;
|
| 202 |
+
font-family: monospace;
|
| 203 |
+
font-size: 0.82rem;
|
| 204 |
+
font-weight: 600;
|
| 205 |
+
}
|
| 206 |
+
|
| 207 |
+
.gt-answer.loading {
|
| 208 |
+
background: #6c757d;
|
| 209 |
+
}
|
| 210 |
+
|
| 211 |
+
/* SVG arrow overlay */
|
| 212 |
+
#arrow-overlay {
|
| 213 |
+
position: absolute;
|
| 214 |
+
top: 0;
|
| 215 |
+
left: 0;
|
| 216 |
+
width: 100%;
|
| 217 |
+
height: 100%;
|
| 218 |
+
pointer-events: none;
|
| 219 |
+
overflow: visible;
|
| 220 |
+
z-index: 10;
|
| 221 |
+
}
|
| 222 |
+
|
| 223 |
+
.exec-arrow {
|
| 224 |
+
fill: none;
|
| 225 |
+
stroke: #e67e22;
|
| 226 |
+
stroke-width: 2.5;
|
| 227 |
+
stroke-dasharray: none;
|
| 228 |
+
opacity: 0.9;
|
| 229 |
+
}
|
| 230 |
+
|
| 231 |
+
.exec-arrow-head {
|
| 232 |
+
fill: #e67e22;
|
| 233 |
+
opacity: 0.9;
|
| 234 |
+
}
|
| 235 |
+
|
| 236 |
+
/* CRUXEval answer highlight */
|
| 237 |
+
.crux-answer {
|
| 238 |
+
border-left: 4px solid #17a2b8 !important;
|
| 239 |
+
background: #e8f6f8 !important;
|
| 240 |
+
}
|
| 241 |
+
|
| 242 |
+
/* Before/after diff view */
|
| 243 |
+
.diff-container {
|
| 244 |
+
display: grid;
|
| 245 |
+
grid-template-columns: 1fr 1fr;
|
| 246 |
+
gap: 20px;
|
| 247 |
+
}
|
| 248 |
+
|
| 249 |
+
@media (max-width: 1024px) {
|
| 250 |
+
.diff-container {
|
| 251 |
+
grid-template-columns: 1fr;
|
| 252 |
+
}
|
| 253 |
+
}
|
| 254 |
+
|
| 255 |
+
.diff-panel {
|
| 256 |
+
overflow-x: auto;
|
| 257 |
+
}
|
| 258 |
+
|
| 259 |
+
.diff-panel h3 {
|
| 260 |
+
margin-bottom: 10px;
|
| 261 |
+
font-size: 1.1rem;
|
| 262 |
+
}
|
| 263 |
+
|
| 264 |
+
.diff-panel h3 .diff-label-buggy {
|
| 265 |
+
color: #e74c3c;
|
| 266 |
+
}
|
| 267 |
+
|
| 268 |
+
.diff-panel h3 .diff-label-fixed {
|
| 269 |
+
color: #27ae60;
|
| 270 |
+
}
|
| 271 |
+
|
| 272 |
+
.bug-info {
|
| 273 |
+
margin-bottom: 15px;
|
| 274 |
+
padding: 12px 15px;
|
| 275 |
+
border-left: 4px solid #e74c3c;
|
| 276 |
+
background: #fdf2f2;
|
| 277 |
+
border-radius: 4px;
|
| 278 |
+
}
|
| 279 |
+
|
| 280 |
+
.bug-info .bug-category {
|
| 281 |
+
display: inline-block;
|
| 282 |
+
padding: 2px 8px;
|
| 283 |
+
background: #e74c3c;
|
| 284 |
+
color: white;
|
| 285 |
+
border-radius: 3px;
|
| 286 |
+
font-size: 0.82rem;
|
| 287 |
+
font-weight: 600;
|
| 288 |
+
margin-right: 8px;
|
| 289 |
+
}
|
| 290 |
+
|
| 291 |
+
.bug-info .bug-subtype {
|
| 292 |
+
display: inline-block;
|
| 293 |
+
padding: 2px 8px;
|
| 294 |
+
background: #c0392b;
|
| 295 |
+
color: white;
|
| 296 |
+
border-radius: 3px;
|
| 297 |
+
font-size: 0.82rem;
|
| 298 |
+
font-weight: 600;
|
| 299 |
+
}
|
| 300 |
+
|
| 301 |
+
/* Multi-language view */
|
| 302 |
+
.lang-tabs {
|
| 303 |
+
display: flex;
|
| 304 |
+
gap: 0;
|
| 305 |
+
border-bottom: 2px solid #e1e4e8;
|
| 306 |
+
margin-bottom: 0;
|
| 307 |
+
}
|
| 308 |
+
|
| 309 |
+
.lang-tab {
|
| 310 |
+
padding: 10px 20px;
|
| 311 |
+
background: #f8f9fa;
|
| 312 |
+
border: 1px solid #e1e4e8;
|
| 313 |
+
border-bottom: none;
|
| 314 |
+
cursor: pointer;
|
| 315 |
+
font-size: 0.95rem;
|
| 316 |
+
font-weight: 500;
|
| 317 |
+
transition: all 0.2s;
|
| 318 |
+
border-radius: 4px 4px 0 0;
|
| 319 |
+
margin-right: 2px;
|
| 320 |
+
}
|
| 321 |
+
|
| 322 |
+
.lang-tab:hover {
|
| 323 |
+
background: #e8f4f8;
|
| 324 |
+
}
|
| 325 |
+
|
| 326 |
+
.lang-tab.active {
|
| 327 |
+
background: white;
|
| 328 |
+
border-bottom: 2px solid white;
|
| 329 |
+
margin-bottom: -2px;
|
| 330 |
+
color: #3498db;
|
| 331 |
+
font-weight: 600;
|
| 332 |
+
}
|
| 333 |
+
|
| 334 |
+
.lang-code-panel {
|
| 335 |
+
display: none;
|
| 336 |
+
}
|
| 337 |
+
|
| 338 |
+
.lang-code-panel.active {
|
| 339 |
+
display: block;
|
| 340 |
+
}
|
| 341 |
+
|
| 342 |
+
/* BigOBench complexity display */
|
| 343 |
+
.complexity-badges {
|
| 344 |
+
display: flex;
|
| 345 |
+
gap: 20px;
|
| 346 |
+
flex-wrap: wrap;
|
| 347 |
+
}
|
| 348 |
+
|
| 349 |
+
.complexity-item {
|
| 350 |
+
display: flex;
|
| 351 |
+
align-items: center;
|
| 352 |
+
gap: 10px;
|
| 353 |
+
}
|
| 354 |
+
|
| 355 |
+
.complexity-label {
|
| 356 |
+
font-weight: 600;
|
| 357 |
+
color: #7f8c8d;
|
| 358 |
+
font-size: 0.95rem;
|
| 359 |
+
}
|
| 360 |
+
|
| 361 |
+
.complexity-value {
|
| 362 |
+
display: inline-block;
|
| 363 |
+
padding: 6px 16px;
|
| 364 |
+
background: #2c3e50;
|
| 365 |
+
color: #f1c40f;
|
| 366 |
+
border-radius: 4px;
|
| 367 |
+
font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', monospace;
|
| 368 |
+
font-size: 1.1rem;
|
| 369 |
+
font-weight: 600;
|
| 370 |
+
}
|
| 371 |
+
|
| 372 |
+
/* Diff view (GitHub-style table with line numbers) */
|
| 373 |
+
.diff-view {
|
| 374 |
+
font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', monospace;
|
| 375 |
+
font-size: 0.85rem;
|
| 376 |
+
line-height: 1.5;
|
| 377 |
+
overflow-x: auto;
|
| 378 |
+
border: 1px solid #e1e4e8;
|
| 379 |
+
border-radius: 4px;
|
| 380 |
+
}
|
| 381 |
+
|
| 382 |
+
.diff-table {
|
| 383 |
+
border-collapse: collapse;
|
| 384 |
+
width: 100%;
|
| 385 |
+
}
|
| 386 |
+
|
| 387 |
+
.diff-table td {
|
| 388 |
+
padding: 0 8px;
|
| 389 |
+
white-space: pre;
|
| 390 |
+
vertical-align: top;
|
| 391 |
+
}
|
| 392 |
+
|
| 393 |
+
.diff-ln {
|
| 394 |
+
width: 1%;
|
| 395 |
+
min-width: 40px;
|
| 396 |
+
color: #959da5;
|
| 397 |
+
text-align: right;
|
| 398 |
+
user-select: none;
|
| 399 |
+
font-size: 0.8rem;
|
| 400 |
+
padding: 0 6px !important;
|
| 401 |
+
border-right: 1px solid #e1e4e8;
|
| 402 |
+
}
|
| 403 |
+
|
| 404 |
+
.diff-tr-add td { background: #e6ffec; }
|
| 405 |
+
.diff-td-add { color: #24292e; }
|
| 406 |
+
.diff-tr-add .diff-ln { background: #ccffd8; color: #22863a; }
|
| 407 |
+
|
| 408 |
+
.diff-tr-del td { background: #ffebe9; }
|
| 409 |
+
.diff-td-del { color: #24292e; }
|
| 410 |
+
.diff-tr-del .diff-ln { background: #ffd7d5; color: #cb2431; }
|
| 411 |
+
|
| 412 |
+
.diff-tr-ctx td { background: white; }
|
| 413 |
+
.diff-td-ctx { color: #586069; }
|
| 414 |
+
|
| 415 |
+
.diff-tr-hunk td {
|
| 416 |
+
background: #f1f8ff;
|
| 417 |
+
color: #0366d6;
|
| 418 |
+
font-weight: 600;
|
| 419 |
+
padding: 4px 8px;
|
| 420 |
+
}
|
| 421 |
+
|
| 422 |
+
.diff-tr-header td {
|
| 423 |
+
background: #fafbfc;
|
| 424 |
+
color: #6a737d;
|
| 425 |
+
font-weight: 600;
|
| 426 |
+
padding: 4px 8px;
|
| 427 |
+
border-bottom: 1px solid #e1e4e8;
|
| 428 |
+
}
|
| 429 |
+
|
| 430 |
+
/* Diff file sections (GitHub-style per-file headers) */
|
| 431 |
+
.diff-file-section {
|
| 432 |
+
margin-bottom: 16px;
|
| 433 |
+
border: 1px solid #d0d7de;
|
| 434 |
+
border-radius: 6px;
|
| 435 |
+
overflow: hidden;
|
| 436 |
+
}
|
| 437 |
+
|
| 438 |
+
.diff-file-section .diff-view {
|
| 439 |
+
border: none;
|
| 440 |
+
border-radius: 0;
|
| 441 |
+
}
|
| 442 |
+
|
| 443 |
+
.diff-file-header {
|
| 444 |
+
display: flex;
|
| 445 |
+
justify-content: space-between;
|
| 446 |
+
align-items: center;
|
| 447 |
+
padding: 8px 12px;
|
| 448 |
+
background: #f6f8fa;
|
| 449 |
+
border-bottom: 1px solid #d0d7de;
|
| 450 |
+
font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', monospace;
|
| 451 |
+
font-size: 0.85rem;
|
| 452 |
+
}
|
| 453 |
+
|
| 454 |
+
.diff-file-path {
|
| 455 |
+
color: #24292f;
|
| 456 |
+
font-weight: 600;
|
| 457 |
+
word-break: break-all;
|
| 458 |
+
}
|
| 459 |
+
|
| 460 |
+
.diff-file-stats {
|
| 461 |
+
white-space: nowrap;
|
| 462 |
+
margin-left: 12px;
|
| 463 |
+
font-size: 0.8rem;
|
| 464 |
+
}
|
| 465 |
+
|
| 466 |
+
.diff-stat-add { color: #1a7f37; font-weight: 600; }
|
| 467 |
+
.diff-stat-del { color: #cf222e; font-weight: 600; margin-left: 6px; }
|
| 468 |
+
|
| 469 |
+
/* GitHub links bar */
|
| 470 |
+
.gh-links-bar {
|
| 471 |
+
display: flex;
|
| 472 |
+
gap: 12px;
|
| 473 |
+
align-items: center;
|
| 474 |
+
flex-wrap: wrap;
|
| 475 |
+
}
|
| 476 |
+
|
| 477 |
+
.gh-link {
|
| 478 |
+
display: inline-block;
|
| 479 |
+
padding: 6px 14px;
|
| 480 |
+
background: #f6f8fa;
|
| 481 |
+
border: 1px solid #d0d7de;
|
| 482 |
+
border-radius: 6px;
|
| 483 |
+
color: #0969da;
|
| 484 |
+
text-decoration: none;
|
| 485 |
+
font-size: 0.9rem;
|
| 486 |
+
font-weight: 500;
|
| 487 |
+
transition: background 0.15s, border-color 0.15s;
|
| 488 |
+
}
|
| 489 |
+
|
| 490 |
+
.gh-link:hover {
|
| 491 |
+
background: #ddf4ff;
|
| 492 |
+
border-color: #0969da;
|
| 493 |
+
}
|
| 494 |
+
|
| 495 |
+
/* Issue / problem statement */
|
| 496 |
+
.issue-statement {
|
| 497 |
+
line-height: 1.7;
|
| 498 |
+
padding: 10px;
|
| 499 |
+
white-space: pre-wrap;
|
| 500 |
+
word-wrap: break-word;
|
| 501 |
+
max-height: 500px;
|
| 502 |
+
overflow-y: auto;
|
| 503 |
+
background: #f8f9fa;
|
| 504 |
+
border: 1px solid #e1e4e8;
|
| 505 |
+
border-radius: 4px;
|
| 506 |
+
font-size: 0.9rem;
|
| 507 |
+
}
|
| 508 |
+
|
| 509 |
+
.test-id-list {
|
| 510 |
+
list-style: none;
|
| 511 |
+
padding: 0;
|
| 512 |
+
}
|
| 513 |
+
|
| 514 |
+
.test-id-list li {
|
| 515 |
+
padding: 4px 8px;
|
| 516 |
+
margin-bottom: 4px;
|
| 517 |
+
background: #f8f9fa;
|
| 518 |
+
border-radius: 3px;
|
| 519 |
+
font-family: monospace;
|
| 520 |
+
font-size: 0.82rem;
|
| 521 |
+
border-left: 3px solid #e74c3c;
|
| 522 |
+
}
|
| 523 |
+
|
| 524 |
+
.test-id-list li.pass-to-pass {
|
| 525 |
+
border-left-color: #27ae60;
|
| 526 |
+
}
|
| 527 |
+
|
| 528 |
+
/* Fill-in-the-Middle (SAFIM) view */
|
| 529 |
+
.fim-hole-marker {
|
| 530 |
+
display: inline-block;
|
| 531 |
+
padding: 4px 16px;
|
| 532 |
+
background: #e74c3c;
|
| 533 |
+
color: white;
|
| 534 |
+
border-radius: 4px;
|
| 535 |
+
font-family: monospace;
|
| 536 |
+
font-weight: 600;
|
| 537 |
+
font-size: 0.9rem;
|
| 538 |
+
margin: 4px 0;
|
| 539 |
+
}
|
| 540 |
+
|
| 541 |
+
.fim-answer {
|
| 542 |
+
padding: 15px;
|
| 543 |
+
background: #e8f6e8;
|
| 544 |
+
border-left: 4px solid #27ae60;
|
| 545 |
+
border-radius: 4px;
|
| 546 |
+
font-family: monospace;
|
| 547 |
+
font-size: 0.9rem;
|
| 548 |
+
}
|
| 549 |
+
|
| 550 |
+
.fim-merged-legend {
|
| 551 |
+
margin: 8px 0;
|
| 552 |
+
padding: 6px 12px;
|
| 553 |
+
background: #f8f9fa;
|
| 554 |
+
border-radius: 4px;
|
| 555 |
+
font-size: 0.85rem;
|
| 556 |
+
color: #555;
|
| 557 |
+
}
|
| 558 |
+
|
| 559 |
+
/* Vulnerability view */
|
| 560 |
+
.vuln-status {
|
| 561 |
+
display: inline-block;
|
| 562 |
+
padding: 4px 12px;
|
| 563 |
+
border-radius: 4px;
|
| 564 |
+
font-size: 0.85rem;
|
| 565 |
+
font-weight: 600;
|
| 566 |
+
}
|
| 567 |
+
|
| 568 |
+
.vuln-status-vulnerable {
|
| 569 |
+
background: #e74c3c;
|
| 570 |
+
color: white;
|
| 571 |
+
}
|
| 572 |
+
|
| 573 |
+
.vuln-status-patched {
|
| 574 |
+
background: #27ae60;
|
| 575 |
+
color: white;
|
| 576 |
+
}
|
| 577 |
+
|
| 578 |
+
.cwe-badge {
|
| 579 |
+
display: inline-block;
|
| 580 |
+
padding: 4px 12px;
|
| 581 |
+
background: #2c3e50;
|
| 582 |
+
color: #e74c3c;
|
| 583 |
+
border-radius: 4px;
|
| 584 |
+
font-family: monospace;
|
| 585 |
+
font-size: 0.85rem;
|
| 586 |
+
font-weight: 600;
|
| 587 |
+
}
|
|
@@ -0,0 +1,1313 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
/* global problemIdx, datasetSlug, datasetName, hasGroundTruth, hasTasks */
|
| 2 |
+
|
| 3 |
+
function badgeClass(source) {
|
| 4 |
+
return 'badge-' + source.toLowerCase().replace(/[^a-z0-9]/g, '');
|
| 5 |
+
}
|
| 6 |
+
|
| 7 |
+
async function loadProblem() {
|
| 8 |
+
try {
|
| 9 |
+
const response = await fetch(`/api/${datasetSlug}/problem/${problemIdx}`);
|
| 10 |
+
const problem = await response.json();
|
| 11 |
+
|
| 12 |
+
if (problem.error) {
|
| 13 |
+
document.getElementById('problem-content').innerHTML =
|
| 14 |
+
'<div class="card"><p style="color: red;">Error: ' + problem.error + '</p></div>';
|
| 15 |
+
return;
|
| 16 |
+
}
|
| 17 |
+
|
| 18 |
+
renderProblem(problem);
|
| 19 |
+
} catch (error) {
|
| 20 |
+
document.getElementById('problem-content').innerHTML =
|
| 21 |
+
'<div class="card"><p style="color: red;">Error loading problem: ' + error.message + '</p></div>';
|
| 22 |
+
}
|
| 23 |
+
}
|
| 24 |
+
|
| 25 |
+
function renderProblem(problem) {
|
| 26 |
+
const container = document.getElementById('problem-content');
|
| 27 |
+
|
| 28 |
+
// Main problem info card (shared by all datasets)
|
| 29 |
+
let html = `
|
| 30 |
+
<div class="card">
|
| 31 |
+
<div class="problem-header">
|
| 32 |
+
<h2>${escapeHtml(problem.entry_point)}</h2>
|
| 33 |
+
<span class="badge ${badgeClass(problem.source)}">${escapeHtml(problem.source)}</span>
|
| 34 |
+
</div>
|
| 35 |
+
<div class="problem-meta">
|
| 36 |
+
<div class="meta-item">
|
| 37 |
+
<span class="meta-label">Task ID:</span>
|
| 38 |
+
<span class="meta-value">${escapeHtml(problem.task_id)}</span>
|
| 39 |
+
</div>
|
| 40 |
+
<div class="meta-item">
|
| 41 |
+
<span class="meta-label">Index:</span>
|
| 42 |
+
<span class="meta-value">${problem.idx}</span>
|
| 43 |
+
</div>
|
| 44 |
+
<div class="meta-item">
|
| 45 |
+
<span class="meta-label">Dataset:</span>
|
| 46 |
+
<span class="meta-value">${escapeHtml(datasetName)}</span>
|
| 47 |
+
</div>
|
| 48 |
+
${problem.inputs.length > 0 ? `
|
| 49 |
+
<div class="meta-item">
|
| 50 |
+
<span class="meta-label">Test Inputs:</span>
|
| 51 |
+
<span class="meta-value">${problem.inputs.length}</span>
|
| 52 |
+
</div>` : ''}
|
| 53 |
+
</div>
|
| 54 |
+
</div>
|
| 55 |
+
`;
|
| 56 |
+
|
| 57 |
+
// --- BigOBench view (problem description + per-solution code & complexity) ---
|
| 58 |
+
if (problem.solutions && problem.solutions.length > 0) {
|
| 59 |
+
// Problem description
|
| 60 |
+
if (problem.description) {
|
| 61 |
+
html += `
|
| 62 |
+
<div class="card">
|
| 63 |
+
<h2>Problem Statement</h2>
|
| 64 |
+
<pre class="code-block" style="white-space: pre-wrap;">${escapeHtml(problem.description)}</pre>
|
| 65 |
+
</div>
|
| 66 |
+
`;
|
| 67 |
+
}
|
| 68 |
+
|
| 69 |
+
// Each solution: code + complexity/language badges
|
| 70 |
+
problem.solutions.forEach((sol, i) => {
|
| 71 |
+
html += `
|
| 72 |
+
<div class="card">
|
| 73 |
+
<h2>Solution ${i + 1} <span style="font-size:0.8rem;color:#7f8c8d;font-weight:400;">${escapeHtml(sol.solution_id)}</span></h2>
|
| 74 |
+
<div class="complexity-badges" style="margin-bottom: 15px;">
|
| 75 |
+
`;
|
| 76 |
+
if (sol.language) {
|
| 77 |
+
html += `
|
| 78 |
+
<div class="complexity-item">
|
| 79 |
+
<span class="complexity-label">Language</span>
|
| 80 |
+
<span class="badge badge-info">${escapeHtml(sol.language)}</span>
|
| 81 |
+
</div>`;
|
| 82 |
+
}
|
| 83 |
+
if (sol.time_complexity) {
|
| 84 |
+
html += `
|
| 85 |
+
<div class="complexity-item">
|
| 86 |
+
<span class="complexity-label">Time</span>
|
| 87 |
+
<span class="complexity-value">${escapeHtml(sol.time_complexity)}</span>
|
| 88 |
+
</div>`;
|
| 89 |
+
}
|
| 90 |
+
if (sol.space_complexity) {
|
| 91 |
+
html += `
|
| 92 |
+
<div class="complexity-item">
|
| 93 |
+
<span class="complexity-label">Space</span>
|
| 94 |
+
<span class="complexity-value">${escapeHtml(sol.space_complexity)}</span>
|
| 95 |
+
</div>`;
|
| 96 |
+
}
|
| 97 |
+
html += `
|
| 98 |
+
</div>
|
| 99 |
+
<div class="code-with-tasks">
|
| 100 |
+
${sol.highlighted_code}
|
| 101 |
+
</div>
|
| 102 |
+
</div>
|
| 103 |
+
`;
|
| 104 |
+
});
|
| 105 |
+
|
| 106 |
+
// Navigation hint
|
| 107 |
+
html += `
|
| 108 |
+
<div class="navigation-hint">
|
| 109 |
+
<strong>Tip:</strong> Use the Previous/Next buttons at the top to browse through problems,
|
| 110 |
+
or return to the list view to filter by dataset source or search by name.
|
| 111 |
+
</div>
|
| 112 |
+
`;
|
| 113 |
+
|
| 114 |
+
container.innerHTML = html;
|
| 115 |
+
window.currentProblem = problem;
|
| 116 |
+
return;
|
| 117 |
+
}
|
| 118 |
+
|
| 119 |
+
// --- DebugBench before/after view (buggy → fixed) ---
|
| 120 |
+
if (problem.buggy_code !== undefined && problem.fixed_code !== undefined) {
|
| 121 |
+
// Problem description
|
| 122 |
+
if (problem.description) {
|
| 123 |
+
html += `
|
| 124 |
+
<div class="card">
|
| 125 |
+
<h2>Problem Statement</h2>
|
| 126 |
+
<pre class="code-block" style="white-space: pre-wrap;">${escapeHtml(problem.description)}</pre>
|
| 127 |
+
</div>
|
| 128 |
+
`;
|
| 129 |
+
}
|
| 130 |
+
|
| 131 |
+
// Bug info
|
| 132 |
+
html += `
|
| 133 |
+
<div class="card">
|
| 134 |
+
<h2>Bug Information</h2>
|
| 135 |
+
<div class="bug-info">
|
| 136 |
+
<span class="bug-category">${escapeHtml(problem.bug_category || '')}</span>
|
| 137 |
+
<span class="bug-subtype">${escapeHtml(problem.bug_subtype || '')}</span>
|
| 138 |
+
</div>
|
| 139 |
+
<p style="margin-top: 10px;">${escapeHtml(problem.bug_explanation || '')}</p>
|
| 140 |
+
`;
|
| 141 |
+
if (problem.difficulty) {
|
| 142 |
+
html += `<p style="margin-top: 8px; color: #7f8c8d;">Difficulty: <strong>${escapeHtml(problem.difficulty)}</strong></p>`;
|
| 143 |
+
}
|
| 144 |
+
html += `</div>`;
|
| 145 |
+
|
| 146 |
+
// Unified diff view of buggy → fixed
|
| 147 |
+
const unifiedDiff = computeUnifiedDiff(problem.buggy_code, problem.fixed_code);
|
| 148 |
+
html += `
|
| 149 |
+
<div class="card">
|
| 150 |
+
<h2>Changes</h2>
|
| 151 |
+
<div class="diff-view">${renderComputedDiff(unifiedDiff)}</div>
|
| 152 |
+
</div>
|
| 153 |
+
`;
|
| 154 |
+
|
| 155 |
+
// Side-by-side buggy/fixed code
|
| 156 |
+
html += `
|
| 157 |
+
<div class="card">
|
| 158 |
+
<h2>Full Code Comparison</h2>
|
| 159 |
+
<div class="diff-container">
|
| 160 |
+
<div class="diff-panel">
|
| 161 |
+
<h3><span class="diff-label-buggy">Before</span></h3>
|
| 162 |
+
<div class="code-with-tasks">${problem.buggy_highlighted_code}</div>
|
| 163 |
+
</div>
|
| 164 |
+
<div class="diff-panel">
|
| 165 |
+
<h3><span class="diff-label-fixed">After</span></h3>
|
| 166 |
+
<div class="code-with-tasks">${problem.fixed_highlighted_code}</div>
|
| 167 |
+
</div>
|
| 168 |
+
</div>
|
| 169 |
+
</div>
|
| 170 |
+
`;
|
| 171 |
+
|
| 172 |
+
// Examples
|
| 173 |
+
if (problem.examples && problem.examples.length > 0) {
|
| 174 |
+
html += `<div class="card"><h2>Examples</h2>`;
|
| 175 |
+
problem.examples.forEach((ex, i) => {
|
| 176 |
+
html += `<pre class="code-block" style="margin-bottom: 10px; white-space: pre-wrap;">${escapeHtml(ex)}</pre>`;
|
| 177 |
+
});
|
| 178 |
+
html += `</div>`;
|
| 179 |
+
}
|
| 180 |
+
|
| 181 |
+
html += `
|
| 182 |
+
<div class="navigation-hint">
|
| 183 |
+
<strong>Tip:</strong> Use the Previous/Next buttons at the top to browse through problems,
|
| 184 |
+
or return to the list view to filter by dataset source or search by name.
|
| 185 |
+
</div>
|
| 186 |
+
`;
|
| 187 |
+
|
| 188 |
+
container.innerHTML = html;
|
| 189 |
+
window.currentProblem = problem;
|
| 190 |
+
return;
|
| 191 |
+
}
|
| 192 |
+
|
| 193 |
+
// --- HumanEval-X / HumanEvalPack multi-language view ---
|
| 194 |
+
if (problem.lang_solutions && problem.lang_solutions.length > 0) {
|
| 195 |
+
// Check if this is HumanEvalPack (has buggy_code in solutions)
|
| 196 |
+
const hasBuggy = problem.lang_solutions.some(sol => sol.buggy_code);
|
| 197 |
+
|
| 198 |
+
// Bug info (HumanEvalPack only)
|
| 199 |
+
if (hasBuggy && (problem.bug_type || problem.failure_symptoms)) {
|
| 200 |
+
html += `
|
| 201 |
+
<div class="card">
|
| 202 |
+
<h2>Bug Information</h2>
|
| 203 |
+
<div class="bug-info">
|
| 204 |
+
${problem.bug_type ? `<span class="bug-category">${escapeHtml(problem.bug_type)}</span>` : ''}
|
| 205 |
+
${problem.failure_symptoms ? `<span class="bug-subtype">${escapeHtml(problem.failure_symptoms)}</span>` : ''}
|
| 206 |
+
</div>
|
| 207 |
+
</div>
|
| 208 |
+
`;
|
| 209 |
+
}
|
| 210 |
+
|
| 211 |
+
// Language tabs with code panels
|
| 212 |
+
html += `
|
| 213 |
+
<div class="card">
|
| 214 |
+
<h2>Source Code</h2>
|
| 215 |
+
`;
|
| 216 |
+
|
| 217 |
+
// Buggy/Canonical toggle for HumanEvalPack
|
| 218 |
+
if (hasBuggy) {
|
| 219 |
+
html += `
|
| 220 |
+
<div class="lang-tabs" id="code-mode-tabs" style="margin-bottom: 10px;">
|
| 221 |
+
<button class="lang-tab active" onclick="toggleCodeMode('canonical')">Canonical</button>
|
| 222 |
+
<button class="lang-tab" onclick="toggleCodeMode('buggy')">Buggy</button>
|
| 223 |
+
</div>
|
| 224 |
+
`;
|
| 225 |
+
}
|
| 226 |
+
|
| 227 |
+
html += `<div class="lang-tabs" id="lang-tabs">`;
|
| 228 |
+
problem.lang_solutions.forEach((sol, i) => {
|
| 229 |
+
const label = sol.language_label || sol.language;
|
| 230 |
+
html += `<button class="lang-tab ${i === 0 ? 'active' : ''}" onclick="showLangTab(${i})">${escapeHtml(label)}</button>`;
|
| 231 |
+
});
|
| 232 |
+
html += `</div>`;
|
| 233 |
+
|
| 234 |
+
problem.lang_solutions.forEach((sol, i) => {
|
| 235 |
+
html += `
|
| 236 |
+
<div class="lang-code-panel ${i === 0 ? 'active' : ''}" id="lang-panel-${i}">
|
| 237 |
+
<div class="code-with-tasks" id="lang-code-canonical-${i}">${sol.highlighted_code}</div>
|
| 238 |
+
${sol.buggy_code ? `<div class="code-with-tasks" id="lang-code-buggy-${i}" style="display:none;">${sol.buggy_highlighted_code}</div>` : ''}
|
| 239 |
+
</div>
|
| 240 |
+
`;
|
| 241 |
+
});
|
| 242 |
+
html += `</div>`;
|
| 243 |
+
|
| 244 |
+
// Test suite for current language
|
| 245 |
+
html += `<div class="card" id="lang-test-container">`;
|
| 246 |
+
if (problem.lang_solutions[0].test) {
|
| 247 |
+
html += `<h2>Test Suite</h2><pre class="code-block">${escapeHtml(problem.lang_solutions[0].test)}</pre>`;
|
| 248 |
+
}
|
| 249 |
+
html += `</div>`;
|
| 250 |
+
|
| 251 |
+
// Description
|
| 252 |
+
if (problem.description) {
|
| 253 |
+
html += `
|
| 254 |
+
<div class="card">
|
| 255 |
+
<h2>Problem Description</h2>
|
| 256 |
+
<div style="padding: 10px; line-height: 1.6; white-space: pre-wrap;">${escapeHtml(problem.description)}</div>
|
| 257 |
+
</div>
|
| 258 |
+
`;
|
| 259 |
+
}
|
| 260 |
+
|
| 261 |
+
html += `
|
| 262 |
+
<div class="navigation-hint">
|
| 263 |
+
<strong>Tip:</strong> Use the Previous/Next buttons at the top to browse through problems,
|
| 264 |
+
or return to the list view to filter by dataset source or search by name.
|
| 265 |
+
</div>
|
| 266 |
+
`;
|
| 267 |
+
|
| 268 |
+
container.innerHTML = html;
|
| 269 |
+
window.currentProblem = problem;
|
| 270 |
+
window._currentCodeMode = 'canonical';
|
| 271 |
+
return;
|
| 272 |
+
}
|
| 273 |
+
|
| 274 |
+
// --- SWE-bench / CommitBench diff view (unified diff patch) ---
|
| 275 |
+
if (problem.patch !== undefined) {
|
| 276 |
+
// GitHub links bar (SWE-bench variants)
|
| 277 |
+
const ghLinks = [];
|
| 278 |
+
if (problem.repo_url) ghLinks.push(`<a href="${escapeHtml(problem.repo_url)}" target="_blank" class="gh-link">Repository</a>`);
|
| 279 |
+
if (problem.issue_url) ghLinks.push(`<a href="${escapeHtml(problem.issue_url)}" target="_blank" class="gh-link">Issue</a>`);
|
| 280 |
+
if (problem.commit_url) ghLinks.push(`<a href="${escapeHtml(problem.commit_url)}" target="_blank" class="gh-link">Base Commit</a>`);
|
| 281 |
+
if (ghLinks.length > 0) {
|
| 282 |
+
html += `<div class="card gh-links-bar">${ghLinks.join('')}</div>`;
|
| 283 |
+
}
|
| 284 |
+
|
| 285 |
+
// Metadata badges (version, date)
|
| 286 |
+
const metaBadges = [];
|
| 287 |
+
if (problem.version) metaBadges.push(`<span class="badge badge-info">v${escapeHtml(problem.version)}</span>`);
|
| 288 |
+
if (problem.created_at) metaBadges.push(`<span class="badge badge-info">${escapeHtml(problem.created_at.split('T')[0])}</span>`);
|
| 289 |
+
if (problem.commit_hash) metaBadges.push(`<span class="badge badge-info">${escapeHtml(problem.commit_hash.substring(0, 12))}</span>`);
|
| 290 |
+
if (problem.diff_languages) metaBadges.push(`<span class="badge badge-info">${escapeHtml(problem.diff_languages)}</span>`);
|
| 291 |
+
if (metaBadges.length > 0) {
|
| 292 |
+
html += `<div style="margin-bottom: 15px;">${metaBadges.join(' ')}</div>`;
|
| 293 |
+
}
|
| 294 |
+
|
| 295 |
+
// Problem statement (issue text / commit message)
|
| 296 |
+
if (problem.description) {
|
| 297 |
+
html += `
|
| 298 |
+
<div class="card">
|
| 299 |
+
<h2>${problem.issue_url ? 'Issue Description' : 'Description'}</h2>
|
| 300 |
+
<div class="issue-statement">${escapeHtml(problem.description)}</div>
|
| 301 |
+
</div>
|
| 302 |
+
`;
|
| 303 |
+
}
|
| 304 |
+
|
| 305 |
+
// Render unified diff with per-file sections
|
| 306 |
+
html += renderDiffFiles(problem.patch, 'Solution Patch');
|
| 307 |
+
|
| 308 |
+
// Test patch if available
|
| 309 |
+
if (problem.test_patch) {
|
| 310 |
+
html += renderDiffFiles(problem.test_patch, 'Test Patch');
|
| 311 |
+
}
|
| 312 |
+
|
| 313 |
+
// Failing tests
|
| 314 |
+
if (problem.fail_to_pass && problem.fail_to_pass.length > 0) {
|
| 315 |
+
html += `<div class="card"><h2>Tests: Fail → Pass</h2><ul class="test-id-list">`;
|
| 316 |
+
problem.fail_to_pass.forEach(t => {
|
| 317 |
+
html += `<li>${escapeHtml(t)}</li>`;
|
| 318 |
+
});
|
| 319 |
+
html += `</ul></div>`;
|
| 320 |
+
}
|
| 321 |
+
|
| 322 |
+
// Hints
|
| 323 |
+
if (problem.hints) {
|
| 324 |
+
html += `
|
| 325 |
+
<div class="card">
|
| 326 |
+
<h2>Hints</h2>
|
| 327 |
+
<div class="issue-statement">${escapeHtml(problem.hints)}</div>
|
| 328 |
+
</div>
|
| 329 |
+
`;
|
| 330 |
+
}
|
| 331 |
+
|
| 332 |
+
html += `
|
| 333 |
+
<div class="navigation-hint">
|
| 334 |
+
<strong>Tip:</strong> Use the Previous/Next buttons at the top to browse through problems,
|
| 335 |
+
or return to the list view to filter by dataset source or search by name.
|
| 336 |
+
</div>
|
| 337 |
+
`;
|
| 338 |
+
|
| 339 |
+
container.innerHTML = html;
|
| 340 |
+
window.currentProblem = problem;
|
| 341 |
+
return;
|
| 342 |
+
}
|
| 343 |
+
|
| 344 |
+
// --- SAFIM Fill-in-the-Middle view ---
|
| 345 |
+
if (problem.fim_ground_truth !== undefined) {
|
| 346 |
+
// Tab bar: "With Gap" | "Completed" | "Completion Only"
|
| 347 |
+
html += `
|
| 348 |
+
<div class="card">
|
| 349 |
+
<h2>Fill-in-the-Middle</h2>
|
| 350 |
+
<div class="lang-tabs" id="fim-tabs">
|
| 351 |
+
<button class="lang-tab" onclick="showFimTab(0)">With Gap</button>
|
| 352 |
+
<button class="lang-tab active" onclick="showFimTab(1)">Completed</button>
|
| 353 |
+
<button class="lang-tab" onclick="showFimTab(2)">Completion Only</button>
|
| 354 |
+
</div>
|
| 355 |
+
<div class="lang-code-panel" id="fim-panel-0">
|
| 356 |
+
<div class="code-with-tasks">${problem.highlighted_code}</div>
|
| 357 |
+
</div>
|
| 358 |
+
<div class="lang-code-panel active" id="fim-panel-1">
|
| 359 |
+
<div class="fim-merged-legend">
|
| 360 |
+
<span class="coverage-swatch" style="background:#ffffcc; border:1px solid #ccc;"></span>
|
| 361 |
+
Inserted completion (lines ${problem.fim_gt_start_line}–${problem.fim_gt_end_line})
|
| 362 |
+
</div>
|
| 363 |
+
<div class="code-with-tasks">${problem.fim_merged_highlighted}</div>
|
| 364 |
+
</div>
|
| 365 |
+
<div class="lang-code-panel" id="fim-panel-2">
|
| 366 |
+
<div class="fim-answer">${problem.fim_ground_truth_highlighted || escapeHtml(problem.fim_ground_truth)}</div>
|
| 367 |
+
</div>
|
| 368 |
+
</div>
|
| 369 |
+
`;
|
| 370 |
+
|
| 371 |
+
html += `
|
| 372 |
+
<div class="navigation-hint">
|
| 373 |
+
<strong>Tip:</strong> Use the Previous/Next buttons at the top to browse through problems,
|
| 374 |
+
or return to the list view to filter by dataset source or search by name.
|
| 375 |
+
</div>
|
| 376 |
+
`;
|
| 377 |
+
container.innerHTML = html;
|
| 378 |
+
window.currentProblem = problem;
|
| 379 |
+
return;
|
| 380 |
+
}
|
| 381 |
+
|
| 382 |
+
// --- Vulnerability view (BigVul, DiverseVul, PrimeVul) ---
|
| 383 |
+
if (problem.vulnerable_code !== undefined || problem.is_vulnerable !== undefined) {
|
| 384 |
+
// Vulnerability status and CWE info
|
| 385 |
+
const isVuln = problem.is_vulnerable;
|
| 386 |
+
html += `
|
| 387 |
+
<div class="card">
|
| 388 |
+
<h2>Vulnerability Information</h2>
|
| 389 |
+
<div style="margin-bottom: 10px;">
|
| 390 |
+
<span class="vuln-status ${isVuln ? 'vuln-status-vulnerable' : 'vuln-status-patched'}">
|
| 391 |
+
${isVuln ? 'Vulnerable' : 'Patched'}
|
| 392 |
+
</span>
|
| 393 |
+
${problem.cwe_id ? `<span class="cwe-badge">${escapeHtml(problem.cwe_id)}</span>` : ''}
|
| 394 |
+
${problem.cve_id ? `<span class="badge badge-info">${escapeHtml(problem.cve_id)}</span>` : ''}
|
| 395 |
+
${problem.project ? `<span class="badge badge-info">${escapeHtml(problem.project)}</span>` : ''}
|
| 396 |
+
</div>
|
| 397 |
+
${problem.description ? `<p style="margin-top: 10px; color: #555;">${escapeHtml(problem.description).substring(0, 500)}</p>` : ''}
|
| 398 |
+
</div>
|
| 399 |
+
`;
|
| 400 |
+
|
| 401 |
+
// Show code with vuln/patched side-by-side if both available
|
| 402 |
+
if (problem.vulnerable_code && problem.patched_code) {
|
| 403 |
+
const vulnDiff = computeUnifiedDiff(problem.vulnerable_code, problem.patched_code);
|
| 404 |
+
html += `
|
| 405 |
+
<div class="card">
|
| 406 |
+
<h2>Changes</h2>
|
| 407 |
+
<div class="diff-view">${renderComputedDiff(vulnDiff)}</div>
|
| 408 |
+
</div>
|
| 409 |
+
`;
|
| 410 |
+
html += `
|
| 411 |
+
<div class="card">
|
| 412 |
+
<h2>Full Code Comparison</h2>
|
| 413 |
+
<div class="diff-container">
|
| 414 |
+
<div class="diff-panel">
|
| 415 |
+
<h3><span class="diff-label-buggy">Vulnerable</span></h3>
|
| 416 |
+
<div class="code-with-tasks">${problem.vulnerable_highlighted_code}</div>
|
| 417 |
+
</div>
|
| 418 |
+
<div class="diff-panel">
|
| 419 |
+
<h3><span class="diff-label-fixed">Patched</span></h3>
|
| 420 |
+
<div class="code-with-tasks">${problem.patched_highlighted_code}</div>
|
| 421 |
+
</div>
|
| 422 |
+
</div>
|
| 423 |
+
</div>
|
| 424 |
+
`;
|
| 425 |
+
} else {
|
| 426 |
+
// Single code view
|
| 427 |
+
html += `
|
| 428 |
+
<div class="card">
|
| 429 |
+
<h2>Source Code</h2>
|
| 430 |
+
<div class="code-with-tasks">${problem.highlighted_code}</div>
|
| 431 |
+
</div>
|
| 432 |
+
`;
|
| 433 |
+
}
|
| 434 |
+
|
| 435 |
+
html += `
|
| 436 |
+
<div class="navigation-hint">
|
| 437 |
+
<strong>Tip:</strong> Use the Previous/Next buttons at the top to browse through problems,
|
| 438 |
+
or return to the list view to filter by dataset source or search by name.
|
| 439 |
+
</div>
|
| 440 |
+
`;
|
| 441 |
+
container.innerHTML = html;
|
| 442 |
+
window.currentProblem = problem;
|
| 443 |
+
return;
|
| 444 |
+
}
|
| 445 |
+
|
| 446 |
+
// Source Code card
|
| 447 |
+
html += `
|
| 448 |
+
<div class="card">
|
| 449 |
+
<h2>Source Code</h2>
|
| 450 |
+
<div class="code-with-tasks" id="code-container">
|
| 451 |
+
${problem.highlighted_code}
|
| 452 |
+
</div>
|
| 453 |
+
</div>
|
| 454 |
+
`;
|
| 455 |
+
|
| 456 |
+
// --- Non-DREval (simple) view ---
|
| 457 |
+
if (!hasTasks) {
|
| 458 |
+
// Show description if available (e.g. LiveCodeBench, MBPP+, ClassEval)
|
| 459 |
+
if (problem.description) {
|
| 460 |
+
html += `
|
| 461 |
+
<div class="card">
|
| 462 |
+
<h2>Problem Description</h2>
|
| 463 |
+
<div style="padding: 10px; line-height: 1.6; white-space: pre-wrap;">${escapeHtml(problem.description)}</div>
|
| 464 |
+
</div>
|
| 465 |
+
`;
|
| 466 |
+
}
|
| 467 |
+
|
| 468 |
+
// Show difficulty, contest date, tags, rating if available
|
| 469 |
+
if (problem.difficulty || problem.contest_date || problem.tags || problem.cf_rating) {
|
| 470 |
+
let metaHtml = '';
|
| 471 |
+
if (problem.difficulty) {
|
| 472 |
+
metaHtml += `<span class="badge badge-info">Difficulty: ${escapeHtml(problem.difficulty)}</span>`;
|
| 473 |
+
}
|
| 474 |
+
if (problem.cf_rating) {
|
| 475 |
+
metaHtml += `<span class="badge badge-info">Rating: ${problem.cf_rating}</span>`;
|
| 476 |
+
}
|
| 477 |
+
if (problem.contest_date) {
|
| 478 |
+
metaHtml += `<span class="badge badge-info">Date: ${escapeHtml(problem.contest_date.split('T')[0])}</span>`;
|
| 479 |
+
}
|
| 480 |
+
if (problem.tags && problem.tags.length > 0) {
|
| 481 |
+
problem.tags.forEach(tag => {
|
| 482 |
+
metaHtml += `<span class="badge badge-info">${escapeHtml(tag)}</span>`;
|
| 483 |
+
});
|
| 484 |
+
}
|
| 485 |
+
html += `<div style="margin-bottom: 15px;">${metaHtml}</div>`;
|
| 486 |
+
}
|
| 487 |
+
|
| 488 |
+
// Show inputs/outputs if available
|
| 489 |
+
if (problem.inputs && problem.inputs.length > 0) {
|
| 490 |
+
html += `<div class="card"><h2>Inputs & Outputs</h2>`;
|
| 491 |
+
problem.inputs.forEach((inp, i) => {
|
| 492 |
+
const out = (problem.outputs && problem.outputs[i]) || '';
|
| 493 |
+
html += `
|
| 494 |
+
<div class="io-section" style="margin-bottom: 15px;">
|
| 495 |
+
<div class="task-section">
|
| 496 |
+
<h3>Input ${i + 1}</h3>
|
| 497 |
+
<pre class="code-block">${escapeHtml(inp)}</pre>
|
| 498 |
+
</div>
|
| 499 |
+
<div class="task-section">
|
| 500 |
+
<h3>Output</h3>
|
| 501 |
+
<pre class="code-block">${escapeHtml(out)}</pre>
|
| 502 |
+
</div>
|
| 503 |
+
</div>
|
| 504 |
+
`;
|
| 505 |
+
});
|
| 506 |
+
html += `</div>`;
|
| 507 |
+
}
|
| 508 |
+
|
| 509 |
+
// Show test suite if available
|
| 510 |
+
if (problem.test) {
|
| 511 |
+
html += `
|
| 512 |
+
<div class="card">
|
| 513 |
+
<h2>Test Suite</h2>
|
| 514 |
+
<pre class="code-block">${escapeHtml(problem.test)}</pre>
|
| 515 |
+
</div>
|
| 516 |
+
`;
|
| 517 |
+
}
|
| 518 |
+
|
| 519 |
+
// Navigation hint
|
| 520 |
+
html += `
|
| 521 |
+
<div class="navigation-hint">
|
| 522 |
+
<strong>Tip:</strong> Use the Previous/Next buttons at the top to browse through problems,
|
| 523 |
+
or return to the list view to filter by dataset source or search by name.
|
| 524 |
+
</div>
|
| 525 |
+
`;
|
| 526 |
+
|
| 527 |
+
container.innerHTML = html;
|
| 528 |
+
window.currentProblem = problem;
|
| 529 |
+
return;
|
| 530 |
+
}
|
| 531 |
+
|
| 532 |
+
// --- CRUXEval task view (tasks have given/predict fields, no task_items) ---
|
| 533 |
+
if (problem.tasks.length > 0 && problem.tasks[0].given !== undefined) {
|
| 534 |
+
// Task selector
|
| 535 |
+
html += `
|
| 536 |
+
<div class="card">
|
| 537 |
+
<h2>Tasks</h2>
|
| 538 |
+
<div class="task-selector" id="task-selector">
|
| 539 |
+
`;
|
| 540 |
+
problem.tasks.forEach((task, idx) => {
|
| 541 |
+
html += `
|
| 542 |
+
<button class="task-btn ${idx === 0 ? 'active' : ''}"
|
| 543 |
+
onclick="showCruxTask(${idx})">
|
| 544 |
+
${escapeHtml(task.name)}
|
| 545 |
+
</button>
|
| 546 |
+
`;
|
| 547 |
+
});
|
| 548 |
+
html += `
|
| 549 |
+
</div>
|
| 550 |
+
<div id="task-content"></div>
|
| 551 |
+
</div>
|
| 552 |
+
`;
|
| 553 |
+
|
| 554 |
+
// Navigation hint
|
| 555 |
+
html += `
|
| 556 |
+
<div class="navigation-hint">
|
| 557 |
+
<strong>Tip:</strong> Use the Previous/Next buttons at the top to browse through problems,
|
| 558 |
+
or return to the list view to filter by dataset source or search by name.
|
| 559 |
+
</div>
|
| 560 |
+
`;
|
| 561 |
+
|
| 562 |
+
container.innerHTML = html;
|
| 563 |
+
window.currentProblem = problem;
|
| 564 |
+
showCruxTask(0);
|
| 565 |
+
return;
|
| 566 |
+
}
|
| 567 |
+
|
| 568 |
+
// --- DREval (full) view with tasks, coverage, arrows ---
|
| 569 |
+
// Rebuild html cleanly with coverage legend and SVG overlay
|
| 570 |
+
html = `
|
| 571 |
+
<div class="card">
|
| 572 |
+
<div class="problem-header">
|
| 573 |
+
<h2>${escapeHtml(problem.entry_point)}</h2>
|
| 574 |
+
<span class="badge ${badgeClass(problem.source)}">${escapeHtml(problem.source)}</span>
|
| 575 |
+
</div>
|
| 576 |
+
<div class="problem-meta">
|
| 577 |
+
<div class="meta-item">
|
| 578 |
+
<span class="meta-label">Task ID:</span>
|
| 579 |
+
<span class="meta-value">${escapeHtml(problem.task_id)}</span>
|
| 580 |
+
</div>
|
| 581 |
+
<div class="meta-item">
|
| 582 |
+
<span class="meta-label">Index:</span>
|
| 583 |
+
<span class="meta-value">${problem.idx}</span>
|
| 584 |
+
</div>
|
| 585 |
+
<div class="meta-item">
|
| 586 |
+
<span class="meta-label">Dataset:</span>
|
| 587 |
+
<span class="meta-value">${escapeHtml(datasetName)}</span>
|
| 588 |
+
</div>
|
| 589 |
+
<div class="meta-item">
|
| 590 |
+
<span class="meta-label">Test Inputs:</span>
|
| 591 |
+
<span class="meta-value">${problem.inputs.length}</span>
|
| 592 |
+
</div>
|
| 593 |
+
</div>
|
| 594 |
+
</div>
|
| 595 |
+
|
| 596 |
+
<div class="card">
|
| 597 |
+
<h2>Source Code</h2>
|
| 598 |
+
<div class="coverage-legend" id="coverage-legend">
|
| 599 |
+
<strong>Coverage:</strong>
|
| 600 |
+
<span class="coverage-legend-item">
|
| 601 |
+
<span class="coverage-swatch" style="background:#d4edda; border:1px solid #28a745;"></span>
|
| 602 |
+
Executed
|
| 603 |
+
</span>
|
| 604 |
+
<span class="coverage-legend-item">
|
| 605 |
+
<span class="coverage-swatch" style="background:#f8d7da; border:1px solid #dc3545;"></span>
|
| 606 |
+
Not executed
|
| 607 |
+
</span>
|
| 608 |
+
</div>
|
| 609 |
+
<div class="code-with-tasks" id="code-container">
|
| 610 |
+
${problem.highlighted_code}
|
| 611 |
+
<svg id="arrow-overlay" xmlns="http://www.w3.org/2000/svg">
|
| 612 |
+
<defs>
|
| 613 |
+
<marker id="arrowhead" markerWidth="8" markerHeight="6"
|
| 614 |
+
refX="8" refY="3" orient="auto">
|
| 615 |
+
<polygon points="0 0, 8 3, 0 6" class="exec-arrow-head"/>
|
| 616 |
+
</marker>
|
| 617 |
+
</defs>
|
| 618 |
+
</svg>
|
| 619 |
+
</div>
|
| 620 |
+
</div>
|
| 621 |
+
`;
|
| 622 |
+
|
| 623 |
+
// Task selector
|
| 624 |
+
html += `
|
| 625 |
+
<div class="card">
|
| 626 |
+
<h2>Test Cases & Tasks</h2>
|
| 627 |
+
<p>Select a test input to view associated reasoning tasks:</p>
|
| 628 |
+
<div class="task-selector" id="task-selector">
|
| 629 |
+
`;
|
| 630 |
+
|
| 631 |
+
problem.tasks.forEach((task, idx) => {
|
| 632 |
+
html += `
|
| 633 |
+
<button class="task-btn ${idx === 0 ? 'active' : ''}"
|
| 634 |
+
onclick="showTask(${idx})">
|
| 635 |
+
Input ${task.input_idx + 1}
|
| 636 |
+
</button>
|
| 637 |
+
`;
|
| 638 |
+
});
|
| 639 |
+
|
| 640 |
+
html += `
|
| 641 |
+
</div>
|
| 642 |
+
<div id="task-content"></div>
|
| 643 |
+
</div>
|
| 644 |
+
`;
|
| 645 |
+
|
| 646 |
+
// Navigation hint
|
| 647 |
+
html += `
|
| 648 |
+
<div class="navigation-hint">
|
| 649 |
+
<strong>Tip:</strong> Use the Previous/Next buttons at the top to browse through problems,
|
| 650 |
+
or return to the list view to filter by dataset source or search by name.
|
| 651 |
+
</div>
|
| 652 |
+
`;
|
| 653 |
+
|
| 654 |
+
container.innerHTML = html;
|
| 655 |
+
|
| 656 |
+
// Store problem data globally
|
| 657 |
+
window.currentProblem = problem;
|
| 658 |
+
|
| 659 |
+
// Show first task by default
|
| 660 |
+
showTask(0);
|
| 661 |
+
}
|
| 662 |
+
|
| 663 |
+
function injectTaskMarkers(taskItems) {
|
| 664 |
+
const codePre = document.querySelector('.source .code pre');
|
| 665 |
+
|
| 666 |
+
// Save the pristine original innerHTML once, before any modification.
|
| 667 |
+
if (codePre && !window._codePreOriginalHtml) {
|
| 668 |
+
window._codePreOriginalHtml = codePre.innerHTML;
|
| 669 |
+
}
|
| 670 |
+
|
| 671 |
+
// Invalidate span cache (rebuilt lazily on next arrow draw)
|
| 672 |
+
window._linenoSpanCache = null;
|
| 673 |
+
|
| 674 |
+
// Store current task items so applyCoverage can re-add markers after wrapping.
|
| 675 |
+
window._currentTaskItems = taskItems || [];
|
| 676 |
+
|
| 677 |
+
// Reset code pre to original, then add markers from scratch.
|
| 678 |
+
if (codePre && window._codePreOriginalHtml) {
|
| 679 |
+
codePre.innerHTML = window._codePreOriginalHtml;
|
| 680 |
+
}
|
| 681 |
+
|
| 682 |
+
if (!taskItems || taskItems.length === 0) {
|
| 683 |
+
return;
|
| 684 |
+
}
|
| 685 |
+
|
| 686 |
+
// Group tasks by line number
|
| 687 |
+
const tasksByLine = {};
|
| 688 |
+
taskItems.forEach(item => {
|
| 689 |
+
if (!tasksByLine[item.lineno]) tasksByLine[item.lineno] = [];
|
| 690 |
+
tasksByLine[item.lineno].push(item.var);
|
| 691 |
+
});
|
| 692 |
+
|
| 693 |
+
// Inject task marker badges into the code pre
|
| 694 |
+
if (!codePre) return;
|
| 695 |
+
const codeLines = codePre.innerHTML.split('\n');
|
| 696 |
+
codePre.innerHTML = codeLines.map((line, idx) => {
|
| 697 |
+
const lineNum = idx + 1;
|
| 698 |
+
if (tasksByLine[lineNum] && line.trim() !== '') {
|
| 699 |
+
const vars = tasksByLine[lineNum];
|
| 700 |
+
return line + `<span class="task-marker" data-lineno="${lineNum}" data-vars="${escapeHtml(vars.join(', '))}">${escapeHtml(vars.join(', '))}</span>`;
|
| 701 |
+
}
|
| 702 |
+
return line;
|
| 703 |
+
}).join('\n');
|
| 704 |
+
|
| 705 |
+
}
|
| 706 |
+
|
| 707 |
+
function applyCoverage(coverageSet, totalLines) {
|
| 708 |
+
// Remove previous coverage classes from lineno spans.
|
| 709 |
+
document.querySelectorAll('td.linenos .normal').forEach(el => {
|
| 710 |
+
el.classList.remove('line-executed', 'line-not-executed');
|
| 711 |
+
});
|
| 712 |
+
|
| 713 |
+
if (!coverageSet) {
|
| 714 |
+
const legend = document.getElementById('coverage-legend');
|
| 715 |
+
if (legend) legend.style.display = 'none';
|
| 716 |
+
return;
|
| 717 |
+
}
|
| 718 |
+
|
| 719 |
+
const legend = document.getElementById('coverage-legend');
|
| 720 |
+
if (legend) legend.style.display = 'block';
|
| 721 |
+
|
| 722 |
+
// Color lineno spans only.
|
| 723 |
+
document.querySelectorAll('td.linenos .normal').forEach(span => {
|
| 724 |
+
const lineNum = parseInt(span.textContent.trim());
|
| 725 |
+
if (!isNaN(lineNum) && lineNum <= totalLines) {
|
| 726 |
+
span.classList.add(coverageSet.has(lineNum) ? 'line-executed' : 'line-not-executed');
|
| 727 |
+
}
|
| 728 |
+
});
|
| 729 |
+
}
|
| 730 |
+
|
| 731 |
+
// Global map: lineno -> list of next line numbers (1-indexed; -1 = end of trace)
|
| 732 |
+
window._nextLinesMap = {};
|
| 733 |
+
|
| 734 |
+
async function loadAndApplyGroundTruth(problemIdx, inputIdx, taskItems) {
|
| 735 |
+
// Show "loading" placeholders on all task items
|
| 736 |
+
taskItems.forEach(item => {
|
| 737 |
+
const el = document.getElementById(`gt-${item.lineno}-${item.var}`);
|
| 738 |
+
if (el) { el.textContent = '…'; el.className = 'gt-answer loading'; }
|
| 739 |
+
});
|
| 740 |
+
|
| 741 |
+
// Clear next-lines data from previous input
|
| 742 |
+
window._nextLinesMap = {};
|
| 743 |
+
|
| 744 |
+
try {
|
| 745 |
+
const resp = await fetch(`/api/${datasetSlug}/problem/${problemIdx}/ground_truth/${inputIdx}`);
|
| 746 |
+
const gt = await resp.json();
|
| 747 |
+
|
| 748 |
+
if (gt.status !== 'ok') {
|
| 749 |
+
taskItems.forEach(item => {
|
| 750 |
+
const el = document.getElementById(`gt-${item.lineno}-${item.var}`);
|
| 751 |
+
if (el) { el.textContent = gt.status === 'error' ? '(exec error)' : '(unavailable)'; el.className = 'gt-answer'; }
|
| 752 |
+
});
|
| 753 |
+
applyCoverage(null, 0);
|
| 754 |
+
return;
|
| 755 |
+
}
|
| 756 |
+
|
| 757 |
+
// Apply coverage highlighting
|
| 758 |
+
const coverageSet = new Set(gt.coverage);
|
| 759 |
+
applyCoverage(coverageSet, gt.total_lines);
|
| 760 |
+
|
| 761 |
+
// Fill in variable answers
|
| 762 |
+
const answerMap = {};
|
| 763 |
+
gt.variable_answers.forEach(a => {
|
| 764 |
+
answerMap[`${a.lineno}-${a.var}`] = a.answer_str;
|
| 765 |
+
});
|
| 766 |
+
taskItems.forEach(item => {
|
| 767 |
+
const el = document.getElementById(`gt-${item.lineno}-${item.var}`);
|
| 768 |
+
if (el) {
|
| 769 |
+
const answer = answerMap[`${item.lineno}-${item.var}`] || '(not available)';
|
| 770 |
+
el.textContent = answer;
|
| 771 |
+
el.className = 'gt-answer';
|
| 772 |
+
}
|
| 773 |
+
});
|
| 774 |
+
|
| 775 |
+
// Store next-lines data for arrow visualization
|
| 776 |
+
if (gt.next_lines_answers) {
|
| 777 |
+
gt.next_lines_answers.forEach(a => {
|
| 778 |
+
window._nextLinesMap[a.lineno] = a.next_lines;
|
| 779 |
+
});
|
| 780 |
+
}
|
| 781 |
+
|
| 782 |
+
// Attach hover handlers to task-marker spans now that we have next-lines data
|
| 783 |
+
attachArrowHoverHandlers();
|
| 784 |
+
|
| 785 |
+
} catch (e) {
|
| 786 |
+
taskItems.forEach(item => {
|
| 787 |
+
const el = document.getElementById(`gt-${item.lineno}-${item.var}`);
|
| 788 |
+
if (el) { el.textContent = '(error)'; el.className = 'gt-answer'; }
|
| 789 |
+
});
|
| 790 |
+
}
|
| 791 |
+
}
|
| 792 |
+
|
| 793 |
+
// Cache of lineNum → DOM span, rebuilt whenever injectTaskMarkers runs.
|
| 794 |
+
window._linenoSpanCache = null;
|
| 795 |
+
|
| 796 |
+
function buildLinenoSpanCache(container) {
|
| 797 |
+
const cache = {};
|
| 798 |
+
container.querySelectorAll('td.linenos .normal').forEach(span => {
|
| 799 |
+
const n = parseInt(span.textContent.trim());
|
| 800 |
+
if (!isNaN(n)) cache[n] = span;
|
| 801 |
+
});
|
| 802 |
+
window._linenoSpanCache = cache;
|
| 803 |
+
}
|
| 804 |
+
|
| 805 |
+
/**
|
| 806 |
+
* Get the bounding rect of the lineno span for a given 1-indexed line number,
|
| 807 |
+
* relative to the code container element. Uses a cached span map.
|
| 808 |
+
*/
|
| 809 |
+
function getLinenoSpanRect(lineNum, container) {
|
| 810 |
+
if (!window._linenoSpanCache) buildLinenoSpanCache(container);
|
| 811 |
+
const span = window._linenoSpanCache[lineNum];
|
| 812 |
+
if (!span) return null;
|
| 813 |
+
const spanRect = span.getBoundingClientRect();
|
| 814 |
+
const containerRect = container.getBoundingClientRect();
|
| 815 |
+
return {
|
| 816 |
+
top: spanRect.top - containerRect.top + container.scrollTop,
|
| 817 |
+
bottom: spanRect.bottom - containerRect.top + container.scrollTop,
|
| 818 |
+
left: spanRect.left - containerRect.left,
|
| 819 |
+
right: spanRect.right - containerRect.left,
|
| 820 |
+
width: spanRect.width,
|
| 821 |
+
height: spanRect.height,
|
| 822 |
+
midY: (spanRect.top + spanRect.bottom) / 2 - containerRect.top + container.scrollTop,
|
| 823 |
+
};
|
| 824 |
+
}
|
| 825 |
+
|
| 826 |
+
/**
|
| 827 |
+
* Draw arrows from sourceLine to each of the targetLines in the SVG overlay.
|
| 828 |
+
* Lines are 1-indexed. -1 means "end of execution" (no arrow drawn).
|
| 829 |
+
*/
|
| 830 |
+
function drawArrows(sourceLineNum, targetLineNums) {
|
| 831 |
+
const container = document.getElementById('code-container');
|
| 832 |
+
const svg = document.getElementById('arrow-overlay');
|
| 833 |
+
if (!container || !svg) return;
|
| 834 |
+
|
| 835 |
+
// Remove previous arrows (but keep defs)
|
| 836 |
+
svg.querySelectorAll('.arrow-path').forEach(el => el.remove());
|
| 837 |
+
|
| 838 |
+
const srcRect = getLinenoSpanRect(sourceLineNum, container);
|
| 839 |
+
if (!srcRect) return;
|
| 840 |
+
|
| 841 |
+
// Update SVG height to match container
|
| 842 |
+
svg.setAttribute('height', container.scrollHeight);
|
| 843 |
+
|
| 844 |
+
targetLineNums.forEach(targetLineNum => {
|
| 845 |
+
if (targetLineNum === -1) return; // end of trace — no arrow
|
| 846 |
+
|
| 847 |
+
const dstRect = getLinenoSpanRect(targetLineNum, container);
|
| 848 |
+
if (!dstRect) return;
|
| 849 |
+
|
| 850 |
+
// Start point: right edge of source lineno span, vertically centered
|
| 851 |
+
const x1 = srcRect.right + 2;
|
| 852 |
+
const y1 = srcRect.midY;
|
| 853 |
+
|
| 854 |
+
// End point: right edge of target lineno span, vertically centered
|
| 855 |
+
const x2 = dstRect.right + 2;
|
| 856 |
+
const y2 = dstRect.midY;
|
| 857 |
+
|
| 858 |
+
// Horizontal offset for the bezier control points — curves to the right
|
| 859 |
+
const curveOffset = Math.max(30, Math.abs(y2 - y1) * 0.4);
|
| 860 |
+
|
| 861 |
+
// Cubic bezier: both control points extend to the right of the lineno column
|
| 862 |
+
const cx1 = x1 + curveOffset;
|
| 863 |
+
const cy1 = y1;
|
| 864 |
+
const cx2 = x2 + curveOffset;
|
| 865 |
+
const cy2 = y2;
|
| 866 |
+
|
| 867 |
+
const path = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 868 |
+
path.setAttribute('d', `M ${x1} ${y1} C ${cx1} ${cy1}, ${cx2} ${cy2}, ${x2} ${y2}`);
|
| 869 |
+
path.setAttribute('class', 'exec-arrow arrow-path');
|
| 870 |
+
path.setAttribute('marker-end', 'url(#arrowhead)');
|
| 871 |
+
svg.appendChild(path);
|
| 872 |
+
});
|
| 873 |
+
}
|
| 874 |
+
|
| 875 |
+
/**
|
| 876 |
+
* Clear all arrows from the SVG overlay.
|
| 877 |
+
*/
|
| 878 |
+
function clearArrows() {
|
| 879 |
+
const svg = document.getElementById('arrow-overlay');
|
| 880 |
+
if (svg) {
|
| 881 |
+
svg.querySelectorAll('.arrow-path').forEach(el => el.remove());
|
| 882 |
+
}
|
| 883 |
+
}
|
| 884 |
+
|
| 885 |
+
// AbortController for the current set of marker hover listeners.
|
| 886 |
+
let _markerListenersAbort = null;
|
| 887 |
+
|
| 888 |
+
/**
|
| 889 |
+
* Attach mouseenter/mouseleave handlers to all .task-marker spans so that
|
| 890 |
+
* hovering shows execution-flow arrows to next lines.
|
| 891 |
+
*/
|
| 892 |
+
function attachArrowHoverHandlers() {
|
| 893 |
+
// Cancel any previously attached listeners without touching the DOM.
|
| 894 |
+
if (_markerListenersAbort) _markerListenersAbort.abort();
|
| 895 |
+
_markerListenersAbort = new AbortController();
|
| 896 |
+
const { signal } = _markerListenersAbort;
|
| 897 |
+
|
| 898 |
+
document.querySelectorAll('.task-marker').forEach(marker => {
|
| 899 |
+
marker.addEventListener('mouseenter', () => {
|
| 900 |
+
const lineNum = parseInt(marker.dataset.lineno);
|
| 901 |
+
if (!lineNum) return;
|
| 902 |
+
const nextLines = window._nextLinesMap[lineNum];
|
| 903 |
+
if (nextLines && nextLines.length > 0) {
|
| 904 |
+
drawArrows(lineNum, nextLines);
|
| 905 |
+
}
|
| 906 |
+
}, { signal });
|
| 907 |
+
|
| 908 |
+
marker.addEventListener('mouseleave', () => {
|
| 909 |
+
clearArrows();
|
| 910 |
+
}, { signal });
|
| 911 |
+
});
|
| 912 |
+
}
|
| 913 |
+
|
| 914 |
+
function showCruxTask(taskIdx) {
|
| 915 |
+
const problem = window.currentProblem;
|
| 916 |
+
const task = problem.tasks[taskIdx];
|
| 917 |
+
|
| 918 |
+
// Update active button
|
| 919 |
+
document.querySelectorAll('.task-btn').forEach((btn, idx) => {
|
| 920 |
+
btn.classList.toggle('active', idx === taskIdx);
|
| 921 |
+
});
|
| 922 |
+
|
| 923 |
+
const givenLabel = task.given === 'input' ? 'Input (given)' : 'Output (given)';
|
| 924 |
+
const predictLabel = task.predict === 'output' ? 'Output (predict)' : 'Input (predict)';
|
| 925 |
+
const givenValue = task.given === 'input' ? task.input : task.output;
|
| 926 |
+
const predictValue = task.predict === 'output' ? task.output : task.input;
|
| 927 |
+
|
| 928 |
+
const html = `
|
| 929 |
+
<div class="task-details">
|
| 930 |
+
<div class="task-section">
|
| 931 |
+
<p style="margin-bottom: 12px; color: #7f8c8d;">${escapeHtml(task.description)}</p>
|
| 932 |
+
</div>
|
| 933 |
+
<div class="io-section">
|
| 934 |
+
<div class="task-section">
|
| 935 |
+
<h3>${escapeHtml(givenLabel)}</h3>
|
| 936 |
+
<pre class="code-block">${escapeHtml(givenValue)}</pre>
|
| 937 |
+
</div>
|
| 938 |
+
<div class="task-section">
|
| 939 |
+
<h3>${escapeHtml(predictLabel)}</h3>
|
| 940 |
+
<pre class="code-block crux-answer">${escapeHtml(predictValue)}</pre>
|
| 941 |
+
</div>
|
| 942 |
+
</div>
|
| 943 |
+
</div>
|
| 944 |
+
`;
|
| 945 |
+
|
| 946 |
+
document.getElementById('task-content').innerHTML = html;
|
| 947 |
+
}
|
| 948 |
+
|
| 949 |
+
function showTask(taskIdx) {
|
| 950 |
+
const problem = window.currentProblem;
|
| 951 |
+
const task = problem.tasks[taskIdx];
|
| 952 |
+
|
| 953 |
+
// Update active button
|
| 954 |
+
const buttons = document.querySelectorAll('.task-btn');
|
| 955 |
+
buttons.forEach((btn, idx) => {
|
| 956 |
+
if (idx === taskIdx) {
|
| 957 |
+
btn.classList.add('active');
|
| 958 |
+
} else {
|
| 959 |
+
btn.classList.remove('active');
|
| 960 |
+
}
|
| 961 |
+
});
|
| 962 |
+
|
| 963 |
+
// Inject task markers into the code
|
| 964 |
+
injectTaskMarkers(task.task_items);
|
| 965 |
+
|
| 966 |
+
// Clear previous coverage while new one loads
|
| 967 |
+
applyCoverage(null, 0);
|
| 968 |
+
|
| 969 |
+
// Render task content
|
| 970 |
+
const ioSection = task.test_class_code
|
| 971 |
+
? `<div class="io-section">
|
| 972 |
+
<div class="task-section">
|
| 973 |
+
<h3>Input</h3>
|
| 974 |
+
<pre class="code-block">${escapeHtml(task.input)}</pre>
|
| 975 |
+
</div>
|
| 976 |
+
</div>
|
| 977 |
+
<div class="task-section">
|
| 978 |
+
<h3>Test Class — <code>${escapeHtml(task.test_class_name)}</code></h3>
|
| 979 |
+
<pre class="code-block">${escapeHtml(task.test_class_code)}</pre>
|
| 980 |
+
</div>`
|
| 981 |
+
: `<div class="io-section">
|
| 982 |
+
<div class="task-section">
|
| 983 |
+
<h3>Input</h3>
|
| 984 |
+
<pre class="code-block">${escapeHtml(task.input)}</pre>
|
| 985 |
+
</div>
|
| 986 |
+
<div class="task-section">
|
| 987 |
+
<h3>Expected Output</h3>
|
| 988 |
+
<pre class="code-block">${escapeHtml(task.output)}</pre>
|
| 989 |
+
</div>
|
| 990 |
+
</div>`;
|
| 991 |
+
|
| 992 |
+
let html = `
|
| 993 |
+
<div class="task-details">
|
| 994 |
+
${ioSection}
|
| 995 |
+
`;
|
| 996 |
+
|
| 997 |
+
// Show task items with ground truth answer placeholders
|
| 998 |
+
if (task.task_items && task.task_items.length > 0) {
|
| 999 |
+
html += `
|
| 1000 |
+
<div class="task-section">
|
| 1001 |
+
<h3>Reasoning Tasks</h3>
|
| 1002 |
+
<p style="margin-bottom: 10px; color: #7f8c8d;">
|
| 1003 |
+
Variable state at each execution point (correct answer shown in
|
| 1004 |
+
<span style="background:#17a2b8;color:white;padding:1px 6px;border-radius:3px;font-size:0.82rem;">teal</span>):
|
| 1005 |
+
</p>
|
| 1006 |
+
<ul class="task-items-list">
|
| 1007 |
+
`;
|
| 1008 |
+
|
| 1009 |
+
task.task_items.forEach(item => {
|
| 1010 |
+
html += `
|
| 1011 |
+
<li>
|
| 1012 |
+
<span class="line-ref">Line ${item.lineno}</span>
|
| 1013 |
+
<span class="var-name">${escapeHtml(item.var)}</span>
|
| 1014 |
+
<span class="gt-answer loading" id="gt-${item.lineno}-${item.var}">…</span>
|
| 1015 |
+
</li>
|
| 1016 |
+
`;
|
| 1017 |
+
});
|
| 1018 |
+
|
| 1019 |
+
html += `
|
| 1020 |
+
</ul>
|
| 1021 |
+
</div>
|
| 1022 |
+
`;
|
| 1023 |
+
}
|
| 1024 |
+
|
| 1025 |
+
// Show output prediction task if exists
|
| 1026 |
+
if (task.output_pred) {
|
| 1027 |
+
html += `
|
| 1028 |
+
<div class="task-section">
|
| 1029 |
+
<h3>Output Completion Task</h3>
|
| 1030 |
+
<p style="margin-bottom: 10px; color: #7f8c8d;">
|
| 1031 |
+
The model needs to complete this test assertion:
|
| 1032 |
+
</p>
|
| 1033 |
+
<pre class="code-block">${escapeHtml(task.output_pred)}</pre>
|
| 1034 |
+
</div>
|
| 1035 |
+
`;
|
| 1036 |
+
}
|
| 1037 |
+
|
| 1038 |
+
html += `</div>`;
|
| 1039 |
+
|
| 1040 |
+
document.getElementById('task-content').innerHTML = html;
|
| 1041 |
+
|
| 1042 |
+
// Fetch and apply ground truth (coverage + variable answers)
|
| 1043 |
+
if (hasGroundTruth && task.task_items) {
|
| 1044 |
+
loadAndApplyGroundTruth(problem.idx, task.input_idx, task.task_items);
|
| 1045 |
+
}
|
| 1046 |
+
}
|
| 1047 |
+
|
| 1048 |
+
function showLangTab(idx) {
|
| 1049 |
+
document.querySelectorAll('.lang-tab').forEach((tab, i) => {
|
| 1050 |
+
tab.classList.toggle('active', i === idx);
|
| 1051 |
+
});
|
| 1052 |
+
document.querySelectorAll('.lang-code-panel').forEach((panel, i) => {
|
| 1053 |
+
panel.classList.toggle('active', i === idx);
|
| 1054 |
+
});
|
| 1055 |
+
// Update test section
|
| 1056 |
+
const problem = window.currentProblem;
|
| 1057 |
+
if (problem && problem.lang_solutions) {
|
| 1058 |
+
const sol = problem.lang_solutions[idx];
|
| 1059 |
+
const testContainer = document.getElementById('lang-test-container');
|
| 1060 |
+
if (testContainer && sol.test) {
|
| 1061 |
+
testContainer.innerHTML = `<h2>Test Suite</h2><pre class="code-block">${escapeHtml(sol.test)}</pre>`;
|
| 1062 |
+
} else if (testContainer) {
|
| 1063 |
+
testContainer.innerHTML = '';
|
| 1064 |
+
}
|
| 1065 |
+
}
|
| 1066 |
+
}
|
| 1067 |
+
|
| 1068 |
+
function toggleCodeMode(mode) {
|
| 1069 |
+
window._currentCodeMode = mode;
|
| 1070 |
+
const problem = window.currentProblem;
|
| 1071 |
+
if (!problem || !problem.lang_solutions) return;
|
| 1072 |
+
|
| 1073 |
+
// Update mode tabs
|
| 1074 |
+
const modeTabs = document.querySelectorAll('#code-mode-tabs .lang-tab');
|
| 1075 |
+
modeTabs.forEach(tab => {
|
| 1076 |
+
tab.classList.toggle('active', tab.textContent.trim().toLowerCase() === mode);
|
| 1077 |
+
});
|
| 1078 |
+
|
| 1079 |
+
// Toggle visibility of canonical/buggy code in all panels
|
| 1080 |
+
problem.lang_solutions.forEach((sol, i) => {
|
| 1081 |
+
const canonical = document.getElementById('lang-code-canonical-' + i);
|
| 1082 |
+
const buggy = document.getElementById('lang-code-buggy-' + i);
|
| 1083 |
+
if (canonical) canonical.style.display = mode === 'canonical' ? '' : 'none';
|
| 1084 |
+
if (buggy) buggy.style.display = mode === 'buggy' ? '' : 'none';
|
| 1085 |
+
});
|
| 1086 |
+
}
|
| 1087 |
+
|
| 1088 |
+
function showFimTab(idx) {
|
| 1089 |
+
const tabs = document.querySelectorAll('#fim-tabs .lang-tab');
|
| 1090 |
+
tabs.forEach((tab, i) => tab.classList.toggle('active', i === idx));
|
| 1091 |
+
for (let i = 0; i < 3; i++) {
|
| 1092 |
+
const panel = document.getElementById('fim-panel-' + i);
|
| 1093 |
+
if (panel) panel.classList.toggle('active', i === idx);
|
| 1094 |
+
}
|
| 1095 |
+
}
|
| 1096 |
+
|
| 1097 |
+
/**
|
| 1098 |
+
* Split a unified diff into per-file sections and render each with a GitHub-style
|
| 1099 |
+
* file header bar. Returns an HTML string with one card per file.
|
| 1100 |
+
*/
|
| 1101 |
+
function renderDiffFiles(diffText, title) {
|
| 1102 |
+
if (!diffText) return '';
|
| 1103 |
+
// Split into per-file chunks by "diff --git" boundaries
|
| 1104 |
+
const files = [];
|
| 1105 |
+
let current = null;
|
| 1106 |
+
diffText.split('\n').forEach(line => {
|
| 1107 |
+
if (line.startsWith('diff --git')) {
|
| 1108 |
+
if (current) files.push(current);
|
| 1109 |
+
// Extract file path from "diff --git a/path b/path"
|
| 1110 |
+
const m = line.match(/^diff --git a\/(.+?) b\/(.+)/);
|
| 1111 |
+
const filePath = m ? m[2] : line;
|
| 1112 |
+
current = { path: filePath, lines: [line] };
|
| 1113 |
+
} else if (current) {
|
| 1114 |
+
current.lines.push(line);
|
| 1115 |
+
} else {
|
| 1116 |
+
// Lines before any diff header — create a default section
|
| 1117 |
+
if (!current) current = { path: '', lines: [] };
|
| 1118 |
+
current.lines.push(line);
|
| 1119 |
+
}
|
| 1120 |
+
});
|
| 1121 |
+
if (current) files.push(current);
|
| 1122 |
+
|
| 1123 |
+
if (files.length === 0) return '';
|
| 1124 |
+
|
| 1125 |
+
let html = '';
|
| 1126 |
+
if (files.length === 1 && !files[0].path) {
|
| 1127 |
+
// Single unnamed diff — render as before
|
| 1128 |
+
html += `<div class="card"><h2>${escapeHtml(title)}</h2><div class="diff-view">${renderDiff(diffText)}</div></div>`;
|
| 1129 |
+
} else {
|
| 1130 |
+
html += `<div class="card"><h2>${escapeHtml(title)}</h2>`;
|
| 1131 |
+
files.forEach(file => {
|
| 1132 |
+
const diffChunk = file.lines.join('\n');
|
| 1133 |
+
// Count additions/deletions
|
| 1134 |
+
let adds = 0, dels = 0;
|
| 1135 |
+
file.lines.forEach(l => {
|
| 1136 |
+
if (l.startsWith('+') && !l.startsWith('+++')) adds++;
|
| 1137 |
+
if (l.startsWith('-') && !l.startsWith('---')) dels++;
|
| 1138 |
+
});
|
| 1139 |
+
const statsHtml = `<span class="diff-file-stats"><span class="diff-stat-add">+${adds}</span> <span class="diff-stat-del">-${dels}</span></span>`;
|
| 1140 |
+
html += `
|
| 1141 |
+
<div class="diff-file-section">
|
| 1142 |
+
<div class="diff-file-header">
|
| 1143 |
+
<span class="diff-file-path">${escapeHtml(file.path)}</span>
|
| 1144 |
+
${statsHtml}
|
| 1145 |
+
</div>
|
| 1146 |
+
<div class="diff-view">${renderDiff(diffChunk)}</div>
|
| 1147 |
+
</div>
|
| 1148 |
+
`;
|
| 1149 |
+
});
|
| 1150 |
+
html += `</div>`;
|
| 1151 |
+
}
|
| 1152 |
+
return html;
|
| 1153 |
+
}
|
| 1154 |
+
|
| 1155 |
+
/**
|
| 1156 |
+
* Render a unified diff with line numbers and file headers (GitHub-style).
|
| 1157 |
+
*/
|
| 1158 |
+
function renderDiff(diffText) {
|
| 1159 |
+
if (!diffText) return '';
|
| 1160 |
+
const lines = diffText.split('\n');
|
| 1161 |
+
let oldLine = 0, newLine = 0;
|
| 1162 |
+
const rows = [];
|
| 1163 |
+
|
| 1164 |
+
lines.forEach(line => {
|
| 1165 |
+
if (line.startsWith('diff ')) {
|
| 1166 |
+
rows.push(`<tr class="diff-tr-header"><td class="diff-ln"></td><td class="diff-ln"></td><td class="diff-td-header">${escapeHtml(line)}</td></tr>`);
|
| 1167 |
+
return;
|
| 1168 |
+
}
|
| 1169 |
+
if (line.startsWith('---') || line.startsWith('+++')) {
|
| 1170 |
+
rows.push(`<tr class="diff-tr-header"><td class="diff-ln"></td><td class="diff-ln"></td><td class="diff-td-header">${escapeHtml(line)}</td></tr>`);
|
| 1171 |
+
return;
|
| 1172 |
+
}
|
| 1173 |
+
if (line.startsWith('@@')) {
|
| 1174 |
+
// Parse hunk header: @@ -oldStart,oldCount +newStart,newCount @@
|
| 1175 |
+
const m = line.match(/@@ -(\d+)(?:,\d+)? \+(\d+)(?:,\d+)? @@/);
|
| 1176 |
+
if (m) {
|
| 1177 |
+
oldLine = parseInt(m[1]);
|
| 1178 |
+
newLine = parseInt(m[2]);
|
| 1179 |
+
}
|
| 1180 |
+
rows.push(`<tr class="diff-tr-hunk"><td class="diff-ln"></td><td class="diff-ln"></td><td class="diff-td-hunk">${escapeHtml(line)}</td></tr>`);
|
| 1181 |
+
return;
|
| 1182 |
+
}
|
| 1183 |
+
if (line.startsWith('+')) {
|
| 1184 |
+
rows.push(`<tr class="diff-tr-add"><td class="diff-ln"></td><td class="diff-ln">${newLine}</td><td class="diff-td-add">${escapeHtml(line.substring(1))}</td></tr>`);
|
| 1185 |
+
newLine++;
|
| 1186 |
+
} else if (line.startsWith('-')) {
|
| 1187 |
+
rows.push(`<tr class="diff-tr-del"><td class="diff-ln">${oldLine}</td><td class="diff-ln"></td><td class="diff-td-del">${escapeHtml(line.substring(1))}</td></tr>`);
|
| 1188 |
+
oldLine++;
|
| 1189 |
+
} else if (line.startsWith(' ')) {
|
| 1190 |
+
rows.push(`<tr class="diff-tr-ctx"><td class="diff-ln">${oldLine}</td><td class="diff-ln">${newLine}</td><td class="diff-td-ctx">${escapeHtml(line.substring(1))}</td></tr>`);
|
| 1191 |
+
oldLine++;
|
| 1192 |
+
newLine++;
|
| 1193 |
+
} else if (line.trim() === '') {
|
| 1194 |
+
// Empty trailing line
|
| 1195 |
+
} else {
|
| 1196 |
+
rows.push(`<tr class="diff-tr-ctx"><td class="diff-ln">${oldLine}</td><td class="diff-ln">${newLine}</td><td class="diff-td-ctx">${escapeHtml(line)}</td></tr>`);
|
| 1197 |
+
oldLine++;
|
| 1198 |
+
newLine++;
|
| 1199 |
+
}
|
| 1200 |
+
});
|
| 1201 |
+
|
| 1202 |
+
return `<table class="diff-table">${rows.join('')}</table>`;
|
| 1203 |
+
}
|
| 1204 |
+
|
| 1205 |
+
/**
|
| 1206 |
+
* Simple line-by-line diff (LCS-based) between two code strings.
|
| 1207 |
+
* Returns an array of {type: 'context'|'add'|'del', line: string}.
|
| 1208 |
+
*/
|
| 1209 |
+
function computeUnifiedDiff(oldText, newText) {
|
| 1210 |
+
const oldLines = (oldText || '').split('\n');
|
| 1211 |
+
const newLines = (newText || '').split('\n');
|
| 1212 |
+
|
| 1213 |
+
// LCS for line sequences
|
| 1214 |
+
const m = oldLines.length, n = newLines.length;
|
| 1215 |
+
// For very large files, just show both in full instead of computing LCS
|
| 1216 |
+
if (m * n > 500000) {
|
| 1217 |
+
const result = [];
|
| 1218 |
+
oldLines.forEach(l => result.push({type: 'del', line: l}));
|
| 1219 |
+
newLines.forEach(l => result.push({type: 'add', line: l}));
|
| 1220 |
+
return result;
|
| 1221 |
+
}
|
| 1222 |
+
|
| 1223 |
+
const dp = Array.from({length: m + 1}, () => new Uint16Array(n + 1));
|
| 1224 |
+
for (let i = 1; i <= m; i++) {
|
| 1225 |
+
for (let j = 1; j <= n; j++) {
|
| 1226 |
+
if (oldLines[i - 1] === newLines[j - 1]) {
|
| 1227 |
+
dp[i][j] = dp[i - 1][j - 1] + 1;
|
| 1228 |
+
} else {
|
| 1229 |
+
dp[i][j] = Math.max(dp[i - 1][j], dp[i][j - 1]);
|
| 1230 |
+
}
|
| 1231 |
+
}
|
| 1232 |
+
}
|
| 1233 |
+
|
| 1234 |
+
// Backtrack to build diff
|
| 1235 |
+
const result = [];
|
| 1236 |
+
let i = m, j = n;
|
| 1237 |
+
while (i > 0 || j > 0) {
|
| 1238 |
+
if (i > 0 && j > 0 && oldLines[i - 1] === newLines[j - 1]) {
|
| 1239 |
+
result.push({type: 'context', line: oldLines[i - 1]});
|
| 1240 |
+
i--; j--;
|
| 1241 |
+
} else if (j > 0 && (i === 0 || dp[i][j - 1] >= dp[i - 1][j])) {
|
| 1242 |
+
result.push({type: 'add', line: newLines[j - 1]});
|
| 1243 |
+
j--;
|
| 1244 |
+
} else {
|
| 1245 |
+
result.push({type: 'del', line: oldLines[i - 1]});
|
| 1246 |
+
i--;
|
| 1247 |
+
}
|
| 1248 |
+
}
|
| 1249 |
+
result.reverse();
|
| 1250 |
+
|
| 1251 |
+
// Compact: only show hunks with context (3 lines around changes)
|
| 1252 |
+
const contextSize = 3;
|
| 1253 |
+
const hasChange = result.map(r => r.type !== 'context');
|
| 1254 |
+
const show = new Uint8Array(result.length);
|
| 1255 |
+
for (let k = 0; k < result.length; k++) {
|
| 1256 |
+
if (hasChange[k]) {
|
| 1257 |
+
for (let c = Math.max(0, k - contextSize); c <= Math.min(result.length - 1, k + contextSize); c++) {
|
| 1258 |
+
show[c] = 1;
|
| 1259 |
+
}
|
| 1260 |
+
}
|
| 1261 |
+
}
|
| 1262 |
+
|
| 1263 |
+
const compacted = [];
|
| 1264 |
+
let lastShown = -1;
|
| 1265 |
+
for (let k = 0; k < result.length; k++) {
|
| 1266 |
+
if (show[k]) {
|
| 1267 |
+
if (lastShown >= 0 && k - lastShown > 1) {
|
| 1268 |
+
compacted.push({type: 'separator', line: '...'});
|
| 1269 |
+
}
|
| 1270 |
+
compacted.push(result[k]);
|
| 1271 |
+
lastShown = k;
|
| 1272 |
+
}
|
| 1273 |
+
}
|
| 1274 |
+
|
| 1275 |
+
return compacted.length > 0 ? compacted : result;
|
| 1276 |
+
}
|
| 1277 |
+
|
| 1278 |
+
/**
|
| 1279 |
+
* Render the output of computeUnifiedDiff into diff HTML with line numbers.
|
| 1280 |
+
*/
|
| 1281 |
+
function renderComputedDiff(diffEntries) {
|
| 1282 |
+
let oldLine = 1, newLine = 1;
|
| 1283 |
+
const rows = diffEntries.map(entry => {
|
| 1284 |
+
if (entry.type === 'separator') {
|
| 1285 |
+
return `<tr class="diff-tr-hunk"><td class="diff-ln"></td><td class="diff-ln"></td><td class="diff-td-hunk">${escapeHtml(entry.line)}</td></tr>`;
|
| 1286 |
+
}
|
| 1287 |
+
if (entry.type === 'del') {
|
| 1288 |
+
const row = `<tr class="diff-tr-del"><td class="diff-ln">${oldLine}</td><td class="diff-ln"></td><td class="diff-td-del">${escapeHtml(entry.line)}</td></tr>`;
|
| 1289 |
+
oldLine++;
|
| 1290 |
+
return row;
|
| 1291 |
+
}
|
| 1292 |
+
if (entry.type === 'add') {
|
| 1293 |
+
const row = `<tr class="diff-tr-add"><td class="diff-ln"></td><td class="diff-ln">${newLine}</td><td class="diff-td-add">${escapeHtml(entry.line)}</td></tr>`;
|
| 1294 |
+
newLine++;
|
| 1295 |
+
return row;
|
| 1296 |
+
}
|
| 1297 |
+
// context
|
| 1298 |
+
const row = `<tr class="diff-tr-ctx"><td class="diff-ln">${oldLine}</td><td class="diff-ln">${newLine}</td><td class="diff-td-ctx">${escapeHtml(entry.line)}</td></tr>`;
|
| 1299 |
+
oldLine++;
|
| 1300 |
+
newLine++;
|
| 1301 |
+
return row;
|
| 1302 |
+
});
|
| 1303 |
+
return `<table class="diff-table">${rows.join('')}</table>`;
|
| 1304 |
+
}
|
| 1305 |
+
|
| 1306 |
+
function escapeHtml(text) {
|
| 1307 |
+
if (text === null || text === undefined) return '';
|
| 1308 |
+
const div = document.createElement('div');
|
| 1309 |
+
div.textContent = text;
|
| 1310 |
+
return div.innerHTML;
|
| 1311 |
+
}
|
| 1312 |
+
|
| 1313 |
+
loadProblem();
|
|
@@ -92,6 +92,159 @@
|
|
| 92 |
color: white;
|
| 93 |
}
|
| 94 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
.badge-info {
|
| 96 |
background: #ecf0f1;
|
| 97 |
color: #2c3e50;
|
|
@@ -192,6 +345,7 @@
|
|
| 192 |
|
| 193 |
{% block extra_css %}{% endblock %}
|
| 194 |
</style>
|
|
|
|
| 195 |
</head>
|
| 196 |
<body>
|
| 197 |
<header>
|
|
|
|
| 92 |
color: white;
|
| 93 |
}
|
| 94 |
|
| 95 |
+
.badge-mbpp {
|
| 96 |
+
background: #16a085;
|
| 97 |
+
color: white;
|
| 98 |
+
}
|
| 99 |
+
|
| 100 |
+
.badge-codeforces {
|
| 101 |
+
background: #e74c3c;
|
| 102 |
+
color: white;
|
| 103 |
+
}
|
| 104 |
+
|
| 105 |
+
.badge-leetcode {
|
| 106 |
+
background: #f39c12;
|
| 107 |
+
color: white;
|
| 108 |
+
}
|
| 109 |
+
|
| 110 |
+
.badge-atcoder {
|
| 111 |
+
background: #2ecc71;
|
| 112 |
+
color: white;
|
| 113 |
+
}
|
| 114 |
+
|
| 115 |
+
.badge-cppsyntaxerror, .badge-cppreferenceerror, .badge-cpplogicerror, .badge-cppmultipleerror {
|
| 116 |
+
background: #3498db;
|
| 117 |
+
color: white;
|
| 118 |
+
}
|
| 119 |
+
|
| 120 |
+
.badge-javasyntaxerror, .badge-javareferenceerror, .badge-javalogicerror, .badge-javamultipleerror {
|
| 121 |
+
background: #e67e22;
|
| 122 |
+
color: white;
|
| 123 |
+
}
|
| 124 |
+
|
| 125 |
+
.badge-pythonsyntaxerror, .badge-pythonreferenceerror, .badge-pythonlogicerror, .badge-pythonmultipleerror {
|
| 126 |
+
background: #2ecc71;
|
| 127 |
+
color: white;
|
| 128 |
+
}
|
| 129 |
+
|
| 130 |
+
.badge-humanevalx {
|
| 131 |
+
background: #1abc9c;
|
| 132 |
+
color: white;
|
| 133 |
+
}
|
| 134 |
+
|
| 135 |
+
/* SWE-bench repo badges */
|
| 136 |
+
.badge-djangodjango, .badge-astropyastropy, .badge-matabormataborlib, .badge-scikitimagescikitimage {
|
| 137 |
+
background: #0d6efd;
|
| 138 |
+
color: white;
|
| 139 |
+
}
|
| 140 |
+
|
| 141 |
+
.badge-sympy, .badge-sympysympy, .badge-pylintdevpylint, .badge-sphinxdocsphinx,
|
| 142 |
+
.badge-palletstflask, .badge-palletsjinja, .badge-pyaborpyabor, .badge-pytestdevpytest {
|
| 143 |
+
background: #6610f2;
|
| 144 |
+
color: white;
|
| 145 |
+
}
|
| 146 |
+
|
| 147 |
+
/* APPS difficulty badges */
|
| 148 |
+
.badge-introductory {
|
| 149 |
+
background: #27ae60;
|
| 150 |
+
color: white;
|
| 151 |
+
}
|
| 152 |
+
|
| 153 |
+
.badge-interview {
|
| 154 |
+
background: #f39c12;
|
| 155 |
+
color: white;
|
| 156 |
+
}
|
| 157 |
+
|
| 158 |
+
.badge-competition {
|
| 159 |
+
background: #e74c3c;
|
| 160 |
+
color: white;
|
| 161 |
+
}
|
| 162 |
+
|
| 163 |
+
/* CanItEdit change kind badges */
|
| 164 |
+
.badge-adaptive {
|
| 165 |
+
background: #3498db;
|
| 166 |
+
color: white;
|
| 167 |
+
}
|
| 168 |
+
|
| 169 |
+
.badge-perfective {
|
| 170 |
+
background: #2ecc71;
|
| 171 |
+
color: white;
|
| 172 |
+
}
|
| 173 |
+
|
| 174 |
+
.badge-corrective {
|
| 175 |
+
background: #e67e22;
|
| 176 |
+
color: white;
|
| 177 |
+
}
|
| 178 |
+
|
| 179 |
+
.badge-canitedit {
|
| 180 |
+
background: #9b59b6;
|
| 181 |
+
color: white;
|
| 182 |
+
}
|
| 183 |
+
|
| 184 |
+
/* CodeContests source badges (extend existing) */
|
| 185 |
+
.badge-codechef {
|
| 186 |
+
background: #5b4638;
|
| 187 |
+
color: white;
|
| 188 |
+
}
|
| 189 |
+
|
| 190 |
+
.badge-codejam {
|
| 191 |
+
background: #4285f4;
|
| 192 |
+
color: white;
|
| 193 |
+
}
|
| 194 |
+
|
| 195 |
+
.badge-hackerearth {
|
| 196 |
+
background: #2c3454;
|
| 197 |
+
color: white;
|
| 198 |
+
}
|
| 199 |
+
|
| 200 |
+
.badge-aizu {
|
| 201 |
+
background: #0089d0;
|
| 202 |
+
color: white;
|
| 203 |
+
}
|
| 204 |
+
|
| 205 |
+
.badge-unknown {
|
| 206 |
+
background: #95a5a6;
|
| 207 |
+
color: white;
|
| 208 |
+
}
|
| 209 |
+
|
| 210 |
+
/* SAFIM language badges */
|
| 211 |
+
.badge-python, .badge-java, .badge-c {
|
| 212 |
+
background: #3498db;
|
| 213 |
+
color: white;
|
| 214 |
+
}
|
| 215 |
+
|
| 216 |
+
/* Vulnerability badges */
|
| 217 |
+
.badge-vulnerable {
|
| 218 |
+
background: #e74c3c;
|
| 219 |
+
color: white;
|
| 220 |
+
}
|
| 221 |
+
|
| 222 |
+
.badge-patched {
|
| 223 |
+
background: #27ae60;
|
| 224 |
+
color: white;
|
| 225 |
+
}
|
| 226 |
+
|
| 227 |
+
/* CodeEditorBench type badges */
|
| 228 |
+
.badge-codedebug {
|
| 229 |
+
background: #e74c3c;
|
| 230 |
+
color: white;
|
| 231 |
+
}
|
| 232 |
+
|
| 233 |
+
.badge-codetranslate {
|
| 234 |
+
background: #3498db;
|
| 235 |
+
color: white;
|
| 236 |
+
}
|
| 237 |
+
|
| 238 |
+
.badge-codepolish {
|
| 239 |
+
background: #2ecc71;
|
| 240 |
+
color: white;
|
| 241 |
+
}
|
| 242 |
+
|
| 243 |
+
.badge-coderequirementswitch {
|
| 244 |
+
background: #9b59b6;
|
| 245 |
+
color: white;
|
| 246 |
+
}
|
| 247 |
+
|
| 248 |
.badge-info {
|
| 249 |
background: #ecf0f1;
|
| 250 |
color: #2c3e50;
|
|
|
|
| 345 |
|
| 346 |
{% block extra_css %}{% endblock %}
|
| 347 |
</style>
|
| 348 |
+
{% block extra_head %}{% endblock %}
|
| 349 |
</head>
|
| 350 |
<body>
|
| 351 |
<header>
|
|
@@ -84,55 +84,45 @@
|
|
| 84 |
color: #7f8c8d;
|
| 85 |
}
|
| 86 |
|
| 87 |
-
.stats {
|
| 88 |
display: flex;
|
| 89 |
-
|
| 90 |
-
|
|
|
|
|
|
|
|
|
|
| 91 |
flex-wrap: wrap;
|
| 92 |
}
|
| 93 |
|
| 94 |
-
.
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
box-shadow: 0 2px 4px rgba(0,0,0,0.1);
|
| 99 |
-
flex: 1;
|
| 100 |
-
min-width: 200px;
|
| 101 |
}
|
| 102 |
|
| 103 |
-
.
|
| 104 |
-
|
| 105 |
-
font-weight: 700;
|
| 106 |
-
color: #3498db;
|
| 107 |
}
|
| 108 |
|
| 109 |
-
.
|
| 110 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 111 |
color: #7f8c8d;
|
| 112 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
}
|
| 114 |
</style>
|
| 115 |
{% endblock %}
|
| 116 |
|
| 117 |
{% block content %}
|
| 118 |
-
<div class="stats" id="stats">
|
| 119 |
-
<div class="stat-card">
|
| 120 |
-
<div class="stat-number" id="total-problems">-</div>
|
| 121 |
-
<div class="stat-label">Total Problems</div>
|
| 122 |
-
</div>
|
| 123 |
-
<div class="stat-card" id="stat-source-a">
|
| 124 |
-
<div class="stat-number" id="source-a-count">-</div>
|
| 125 |
-
<div class="stat-label" id="source-a-label">Source A</div>
|
| 126 |
-
</div>
|
| 127 |
-
<div class="stat-card" id="stat-source-b">
|
| 128 |
-
<div class="stat-number" id="source-b-count">-</div>
|
| 129 |
-
<div class="stat-label" id="source-b-label">Source B</div>
|
| 130 |
-
</div>
|
| 131 |
-
<div class="stat-card">
|
| 132 |
-
<div class="stat-number" id="filtered-count">-</div>
|
| 133 |
-
<div class="stat-label">Displayed</div>
|
| 134 |
-
</div>
|
| 135 |
-
</div>
|
| 136 |
|
| 137 |
<div class="card">
|
| 138 |
<h2>Filter Problems</h2>
|
|
@@ -179,7 +169,10 @@ async function loadDatasets() {
|
|
| 179 |
datasets.forEach(ds => {
|
| 180 |
const opt = document.createElement('option');
|
| 181 |
opt.value = ds.slug;
|
| 182 |
-
|
|
|
|
|
|
|
|
|
|
| 183 |
if (ds.slug === currentDataset) opt.selected = true;
|
| 184 |
select.appendChild(opt);
|
| 185 |
});
|
|
@@ -236,27 +229,22 @@ function updateStats() {
|
|
| 236 |
sources[p.source] = (sources[p.source] || 0) + 1;
|
| 237 |
});
|
| 238 |
|
| 239 |
-
|
|
|
|
|
|
|
|
|
|
| 240 |
|
| 241 |
-
const
|
| 242 |
-
|
| 243 |
-
|
| 244 |
-
|
| 245 |
-
|
| 246 |
-
|
| 247 |
-
|
| 248 |
-
|
| 249 |
-
|
| 250 |
-
statA.style.display = 'none';
|
| 251 |
-
}
|
| 252 |
-
|
| 253 |
-
if (sourceNames.length >= 2) {
|
| 254 |
-
statB.style.display = '';
|
| 255 |
-
document.getElementById('source-b-count').textContent = sources[sourceNames[1]];
|
| 256 |
-
document.getElementById('source-b-label').textContent = sourceNames[1];
|
| 257 |
-
} else {
|
| 258 |
-
statB.style.display = 'none';
|
| 259 |
}
|
|
|
|
| 260 |
}
|
| 261 |
|
| 262 |
function badgeClass(source) {
|
|
@@ -268,12 +256,9 @@ function renderProblems(problems) {
|
|
| 268 |
|
| 269 |
if (problems.length === 0) {
|
| 270 |
container.innerHTML = '<div class="card"><p>No problems match your filters.</p></div>';
|
| 271 |
-
document.getElementById('filtered-count').textContent = '0';
|
| 272 |
return;
|
| 273 |
}
|
| 274 |
|
| 275 |
-
document.getElementById('filtered-count').textContent = problems.length;
|
| 276 |
-
|
| 277 |
const grid = document.createElement('div');
|
| 278 |
grid.className = 'problems-grid';
|
| 279 |
|
|
|
|
| 84 |
color: #7f8c8d;
|
| 85 |
}
|
| 86 |
|
| 87 |
+
.stats-bar {
|
| 88 |
display: flex;
|
| 89 |
+
align-items: center;
|
| 90 |
+
gap: 16px;
|
| 91 |
+
margin-bottom: 16px;
|
| 92 |
+
font-size: 0.9rem;
|
| 93 |
+
color: #555;
|
| 94 |
flex-wrap: wrap;
|
| 95 |
}
|
| 96 |
|
| 97 |
+
.stats-total {
|
| 98 |
+
font-weight: 700;
|
| 99 |
+
color: #2c3e50;
|
| 100 |
+
font-size: 0.95rem;
|
|
|
|
|
|
|
|
|
|
| 101 |
}
|
| 102 |
|
| 103 |
+
.stats-sep {
|
| 104 |
+
color: #ccc;
|
|
|
|
|
|
|
| 105 |
}
|
| 106 |
|
| 107 |
+
.stats-tag {
|
| 108 |
+
display: inline-flex;
|
| 109 |
+
align-items: center;
|
| 110 |
+
gap: 4px;
|
| 111 |
+
}
|
| 112 |
+
|
| 113 |
+
.stats-tag-name {
|
| 114 |
color: #7f8c8d;
|
| 115 |
+
}
|
| 116 |
+
|
| 117 |
+
.stats-tag-count {
|
| 118 |
+
font-weight: 600;
|
| 119 |
+
color: #2c3e50;
|
| 120 |
}
|
| 121 |
</style>
|
| 122 |
{% endblock %}
|
| 123 |
|
| 124 |
{% block content %}
|
| 125 |
+
<div class="stats-bar" id="stats-bar"></div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
|
| 127 |
<div class="card">
|
| 128 |
<h2>Filter Problems</h2>
|
|
|
|
| 169 |
datasets.forEach(ds => {
|
| 170 |
const opt = document.createElement('option');
|
| 171 |
opt.value = ds.slug;
|
| 172 |
+
const countLabel = ds.total_count
|
| 173 |
+
? `${ds.problem_count} of ${ds.total_count}`
|
| 174 |
+
: `${ds.problem_count}`;
|
| 175 |
+
opt.textContent = `${ds.display_name} (${countLabel})`;
|
| 176 |
if (ds.slug === currentDataset) opt.selected = true;
|
| 177 |
select.appendChild(opt);
|
| 178 |
});
|
|
|
|
| 229 |
sources[p.source] = (sources[p.source] || 0) + 1;
|
| 230 |
});
|
| 231 |
|
| 232 |
+
const sorted = Object.entries(sources)
|
| 233 |
+
.sort((a, b) => b[1] - a[1]);
|
| 234 |
+
const top5 = sorted.slice(0, 5);
|
| 235 |
+
const otherCount = sorted.slice(5).reduce((sum, [, c]) => sum + c, 0);
|
| 236 |
|
| 237 |
+
const bar = document.getElementById('stats-bar');
|
| 238 |
+
let html = `<span class="stats-total">Total: ${allProblems.length}</span>`;
|
| 239 |
+
top5.forEach(([name, count]) => {
|
| 240 |
+
html += `<span class="stats-sep">|</span>`;
|
| 241 |
+
html += `<span class="stats-tag"><span class="stats-tag-name">${name}:</span> <span class="stats-tag-count">${count}</span></span>`;
|
| 242 |
+
});
|
| 243 |
+
if (otherCount > 0) {
|
| 244 |
+
html += `<span class="stats-sep">|</span>`;
|
| 245 |
+
html += `<span class="stats-tag"><span class="stats-tag-name">Other:</span> <span class="stats-tag-count">${otherCount}</span></span>`;
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 246 |
}
|
| 247 |
+
bar.innerHTML = html;
|
| 248 |
}
|
| 249 |
|
| 250 |
function badgeClass(source) {
|
|
|
|
| 256 |
|
| 257 |
if (problems.length === 0) {
|
| 258 |
container.innerHTML = '<div class="card"><p>No problems match your filters.</p></div>';
|
|
|
|
| 259 |
return;
|
| 260 |
}
|
| 261 |
|
|
|
|
|
|
|
| 262 |
const grid = document.createElement('div');
|
| 263 |
grid.className = 'problems-grid';
|
| 264 |
|
|
@@ -20,283 +20,11 @@
|
|
| 20 |
{% endblock %}
|
| 21 |
|
| 22 |
{% block extra_css %}
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
.problem-header {
|
| 27 |
-
display: flex;
|
| 28 |
-
justify-content: space-between;
|
| 29 |
-
align-items: center;
|
| 30 |
-
margin-bottom: 15px;
|
| 31 |
-
}
|
| 32 |
-
|
| 33 |
-
.problem-meta {
|
| 34 |
-
margin-bottom: 20px;
|
| 35 |
-
}
|
| 36 |
-
|
| 37 |
-
.meta-item {
|
| 38 |
-
display: inline-block;
|
| 39 |
-
margin-right: 15px;
|
| 40 |
-
margin-bottom: 10px;
|
| 41 |
-
}
|
| 42 |
-
|
| 43 |
-
.meta-label {
|
| 44 |
-
font-weight: 600;
|
| 45 |
-
color: #7f8c8d;
|
| 46 |
-
margin-right: 5px;
|
| 47 |
-
}
|
| 48 |
-
|
| 49 |
-
.meta-value {
|
| 50 |
-
color: #2c3e50;
|
| 51 |
-
}
|
| 52 |
-
|
| 53 |
-
.task-selector {
|
| 54 |
-
margin: 20px 0;
|
| 55 |
-
display: flex;
|
| 56 |
-
gap: 10px;
|
| 57 |
-
flex-wrap: wrap;
|
| 58 |
-
}
|
| 59 |
-
|
| 60 |
-
.task-btn {
|
| 61 |
-
padding: 10px 20px;
|
| 62 |
-
background: #ecf0f1;
|
| 63 |
-
border: 2px solid transparent;
|
| 64 |
-
border-radius: 4px;
|
| 65 |
-
cursor: pointer;
|
| 66 |
-
transition: all 0.3s;
|
| 67 |
-
font-size: 0.95rem;
|
| 68 |
-
}
|
| 69 |
-
|
| 70 |
-
.task-btn:hover {
|
| 71 |
-
background: #bdc3c7;
|
| 72 |
-
}
|
| 73 |
-
|
| 74 |
-
.task-btn.active {
|
| 75 |
-
background: #3498db;
|
| 76 |
-
color: white;
|
| 77 |
-
border-color: #2980b9;
|
| 78 |
-
}
|
| 79 |
-
|
| 80 |
-
.task-details {
|
| 81 |
-
margin-top: 20px;
|
| 82 |
-
}
|
| 83 |
-
|
| 84 |
-
.task-section {
|
| 85 |
-
margin-bottom: 25px;
|
| 86 |
-
padding: 15px;
|
| 87 |
-
background: #f8f9fa;
|
| 88 |
-
border-left: 4px solid #3498db;
|
| 89 |
-
border-radius: 4px;
|
| 90 |
-
}
|
| 91 |
-
|
| 92 |
-
.task-section h3 {
|
| 93 |
-
margin-bottom: 10px;
|
| 94 |
-
color: #2c3e50;
|
| 95 |
-
font-size: 1.1rem;
|
| 96 |
-
}
|
| 97 |
-
|
| 98 |
-
.code-block {
|
| 99 |
-
background: #f8f9fa;
|
| 100 |
-
padding: 15px;
|
| 101 |
-
border-radius: 4px;
|
| 102 |
-
overflow-x: auto;
|
| 103 |
-
font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', monospace;
|
| 104 |
-
font-size: 0.9rem;
|
| 105 |
-
border: 1px solid #e1e4e8;
|
| 106 |
-
}
|
| 107 |
-
|
| 108 |
-
.task-items-list {
|
| 109 |
-
list-style: none;
|
| 110 |
-
}
|
| 111 |
-
|
| 112 |
-
.task-items-list li {
|
| 113 |
-
padding: 10px;
|
| 114 |
-
margin-bottom: 8px;
|
| 115 |
-
background: white;
|
| 116 |
-
border-radius: 4px;
|
| 117 |
-
border: 1px solid #e1e4e8;
|
| 118 |
-
}
|
| 119 |
-
|
| 120 |
-
.line-ref {
|
| 121 |
-
display: inline-block;
|
| 122 |
-
padding: 2px 8px;
|
| 123 |
-
background: #3498db;
|
| 124 |
-
color: white;
|
| 125 |
-
border-radius: 3px;
|
| 126 |
-
font-family: monospace;
|
| 127 |
-
font-size: 0.85rem;
|
| 128 |
-
margin-right: 8px;
|
| 129 |
-
}
|
| 130 |
-
|
| 131 |
-
.var-name {
|
| 132 |
-
display: inline-block;
|
| 133 |
-
padding: 2px 8px;
|
| 134 |
-
background: #9b59b6;
|
| 135 |
-
color: white;
|
| 136 |
-
border-radius: 3px;
|
| 137 |
-
font-family: monospace;
|
| 138 |
-
font-size: 0.85rem;
|
| 139 |
-
}
|
| 140 |
-
|
| 141 |
-
.io-section {
|
| 142 |
-
display: grid;
|
| 143 |
-
grid-template-columns: 1fr 1fr;
|
| 144 |
-
gap: 15px;
|
| 145 |
-
}
|
| 146 |
-
|
| 147 |
-
@media (max-width: 768px) {
|
| 148 |
-
.io-section {
|
| 149 |
-
grid-template-columns: 1fr;
|
| 150 |
-
}
|
| 151 |
-
}
|
| 152 |
-
|
| 153 |
-
.navigation-hint {
|
| 154 |
-
margin-top: 20px;
|
| 155 |
-
padding: 15px;
|
| 156 |
-
background: #e8f4f8;
|
| 157 |
-
border-radius: 4px;
|
| 158 |
-
color: #2c3e50;
|
| 159 |
-
font-size: 0.9rem;
|
| 160 |
-
}
|
| 161 |
-
|
| 162 |
-
.test-code-section {
|
| 163 |
-
margin-top: 20px;
|
| 164 |
-
}
|
| 165 |
-
|
| 166 |
-
/* Inline task visualization */
|
| 167 |
-
.code-with-tasks {
|
| 168 |
-
position: relative;
|
| 169 |
-
}
|
| 170 |
-
|
| 171 |
-
.task-marker {
|
| 172 |
-
display: inline-block;
|
| 173 |
-
margin-left: 10px;
|
| 174 |
-
padding: 2px 8px;
|
| 175 |
-
background: #9b59b6;
|
| 176 |
-
color: white;
|
| 177 |
-
border-radius: 3px;
|
| 178 |
-
font-size: 0.75rem;
|
| 179 |
-
font-weight: 600;
|
| 180 |
-
cursor: crosshair;
|
| 181 |
-
}
|
| 182 |
-
|
| 183 |
-
/* Coverage coloring on lineno spans.
|
| 184 |
-
Pygments emits: td.linenos > div.linenodiv > pre > span.normal
|
| 185 |
-
We must match that chain; .source .linenos doesn't work because
|
| 186 |
-
the td has class "linenos", not an element named "linenos". */
|
| 187 |
-
td.linenos .normal.line-executed {
|
| 188 |
-
background-color: #d4edda !important;
|
| 189 |
-
color: #155724 !important;
|
| 190 |
-
}
|
| 191 |
-
|
| 192 |
-
td.linenos .normal.line-not-executed {
|
| 193 |
-
background-color: #f8d7da !important;
|
| 194 |
-
color: #721c24 !important;
|
| 195 |
-
}
|
| 196 |
-
|
| 197 |
-
/* Coverage legend */
|
| 198 |
-
.coverage-legend {
|
| 199 |
-
margin: 10px 0;
|
| 200 |
-
padding: 10px 15px;
|
| 201 |
-
background: #f8f9fa;
|
| 202 |
-
border-left: 4px solid #28a745;
|
| 203 |
-
border-radius: 4px;
|
| 204 |
-
font-size: 0.85rem;
|
| 205 |
-
display: none;
|
| 206 |
-
}
|
| 207 |
-
|
| 208 |
-
.coverage-legend-item {
|
| 209 |
-
display: inline-block;
|
| 210 |
-
margin-right: 18px;
|
| 211 |
-
}
|
| 212 |
-
|
| 213 |
-
.coverage-swatch {
|
| 214 |
-
display: inline-block;
|
| 215 |
-
width: 12px;
|
| 216 |
-
height: 12px;
|
| 217 |
-
border-radius: 2px;
|
| 218 |
-
margin-right: 4px;
|
| 219 |
-
vertical-align: middle;
|
| 220 |
-
}
|
| 221 |
-
|
| 222 |
-
/* Ground truth answer badge shown next to task items */
|
| 223 |
-
.gt-answer {
|
| 224 |
-
display: inline-block;
|
| 225 |
-
margin-left: 10px;
|
| 226 |
-
padding: 2px 8px;
|
| 227 |
-
background: #17a2b8;
|
| 228 |
-
color: white;
|
| 229 |
-
border-radius: 3px;
|
| 230 |
-
font-family: monospace;
|
| 231 |
-
font-size: 0.82rem;
|
| 232 |
-
font-weight: 600;
|
| 233 |
-
}
|
| 234 |
-
|
| 235 |
-
.gt-answer.loading {
|
| 236 |
-
background: #6c757d;
|
| 237 |
-
}
|
| 238 |
-
|
| 239 |
-
/* SVG arrow overlay positioned over the code container */
|
| 240 |
-
#arrow-overlay {
|
| 241 |
-
position: absolute;
|
| 242 |
-
top: 0;
|
| 243 |
-
left: 0;
|
| 244 |
-
width: 100%;
|
| 245 |
-
height: 100%;
|
| 246 |
-
pointer-events: none;
|
| 247 |
-
overflow: visible;
|
| 248 |
-
z-index: 10;
|
| 249 |
-
}
|
| 250 |
-
|
| 251 |
-
.exec-arrow {
|
| 252 |
-
fill: none;
|
| 253 |
-
stroke: #e67e22;
|
| 254 |
-
stroke-width: 2.5;
|
| 255 |
-
stroke-dasharray: none;
|
| 256 |
-
opacity: 0.9;
|
| 257 |
-
}
|
| 258 |
-
|
| 259 |
-
.exec-arrow-head {
|
| 260 |
-
fill: #e67e22;
|
| 261 |
-
opacity: 0.9;
|
| 262 |
-
}
|
| 263 |
-
|
| 264 |
-
/* CRUXEval answer highlight */
|
| 265 |
-
.crux-answer {
|
| 266 |
-
border-left: 4px solid #17a2b8 !important;
|
| 267 |
-
background: #e8f6f8 !important;
|
| 268 |
-
}
|
| 269 |
-
|
| 270 |
-
/* BigOBench complexity display */
|
| 271 |
-
.complexity-badges {
|
| 272 |
-
display: flex;
|
| 273 |
-
gap: 20px;
|
| 274 |
-
flex-wrap: wrap;
|
| 275 |
-
}
|
| 276 |
-
|
| 277 |
-
.complexity-item {
|
| 278 |
-
display: flex;
|
| 279 |
-
align-items: center;
|
| 280 |
-
gap: 10px;
|
| 281 |
-
}
|
| 282 |
-
|
| 283 |
-
.complexity-label {
|
| 284 |
-
font-weight: 600;
|
| 285 |
-
color: #7f8c8d;
|
| 286 |
-
font-size: 0.95rem;
|
| 287 |
-
}
|
| 288 |
|
| 289 |
-
|
| 290 |
-
|
| 291 |
-
padding: 6px 16px;
|
| 292 |
-
background: #2c3e50;
|
| 293 |
-
color: #f1c40f;
|
| 294 |
-
border-radius: 4px;
|
| 295 |
-
font-family: 'Monaco', 'Menlo', 'Ubuntu Mono', monospace;
|
| 296 |
-
font-size: 1.1rem;
|
| 297 |
-
font-weight: 600;
|
| 298 |
-
}
|
| 299 |
-
</style>
|
| 300 |
{% endblock %}
|
| 301 |
|
| 302 |
{% block content %}
|
|
@@ -315,701 +43,6 @@ const datasetSlug = {{ dataset_slug|tojson }};
|
|
| 315 |
const datasetName = {{ dataset_name|tojson }};
|
| 316 |
const hasGroundTruth = {{ has_ground_truth|tojson }};
|
| 317 |
const hasTasks = {{ has_tasks|tojson }};
|
| 318 |
-
|
| 319 |
-
function badgeClass(source) {
|
| 320 |
-
return 'badge-' + source.toLowerCase().replace(/[^a-z0-9]/g, '');
|
| 321 |
-
}
|
| 322 |
-
|
| 323 |
-
async function loadProblem() {
|
| 324 |
-
try {
|
| 325 |
-
const response = await fetch(`/api/${datasetSlug}/problem/${problemIdx}`);
|
| 326 |
-
const problem = await response.json();
|
| 327 |
-
|
| 328 |
-
if (problem.error) {
|
| 329 |
-
document.getElementById('problem-content').innerHTML =
|
| 330 |
-
'<div class="card"><p style="color: red;">Error: ' + problem.error + '</p></div>';
|
| 331 |
-
return;
|
| 332 |
-
}
|
| 333 |
-
|
| 334 |
-
renderProblem(problem);
|
| 335 |
-
} catch (error) {
|
| 336 |
-
document.getElementById('problem-content').innerHTML =
|
| 337 |
-
'<div class="card"><p style="color: red;">Error loading problem: ' + error.message + '</p></div>';
|
| 338 |
-
}
|
| 339 |
-
}
|
| 340 |
-
|
| 341 |
-
function renderProblem(problem) {
|
| 342 |
-
const container = document.getElementById('problem-content');
|
| 343 |
-
|
| 344 |
-
// Main problem info card (shared by all datasets)
|
| 345 |
-
let html = `
|
| 346 |
-
<div class="card">
|
| 347 |
-
<div class="problem-header">
|
| 348 |
-
<h2>${escapeHtml(problem.entry_point)}</h2>
|
| 349 |
-
<span class="badge ${badgeClass(problem.source)}">${escapeHtml(problem.source)}</span>
|
| 350 |
-
</div>
|
| 351 |
-
<div class="problem-meta">
|
| 352 |
-
<div class="meta-item">
|
| 353 |
-
<span class="meta-label">Task ID:</span>
|
| 354 |
-
<span class="meta-value">${escapeHtml(problem.task_id)}</span>
|
| 355 |
-
</div>
|
| 356 |
-
<div class="meta-item">
|
| 357 |
-
<span class="meta-label">Index:</span>
|
| 358 |
-
<span class="meta-value">${problem.idx}</span>
|
| 359 |
-
</div>
|
| 360 |
-
<div class="meta-item">
|
| 361 |
-
<span class="meta-label">Dataset:</span>
|
| 362 |
-
<span class="meta-value">${escapeHtml(datasetName)}</span>
|
| 363 |
-
</div>
|
| 364 |
-
${problem.inputs.length > 0 ? `
|
| 365 |
-
<div class="meta-item">
|
| 366 |
-
<span class="meta-label">Test Inputs:</span>
|
| 367 |
-
<span class="meta-value">${problem.inputs.length}</span>
|
| 368 |
-
</div>` : ''}
|
| 369 |
-
</div>
|
| 370 |
-
</div>
|
| 371 |
-
`;
|
| 372 |
-
|
| 373 |
-
// --- BigOBench view (problem description + per-solution code & complexity) ---
|
| 374 |
-
if (problem.solutions && problem.solutions.length > 0) {
|
| 375 |
-
// Problem description
|
| 376 |
-
if (problem.description) {
|
| 377 |
-
html += `
|
| 378 |
-
<div class="card">
|
| 379 |
-
<h2>Problem Statement</h2>
|
| 380 |
-
<pre class="code-block" style="white-space: pre-wrap;">${escapeHtml(problem.description)}</pre>
|
| 381 |
-
</div>
|
| 382 |
-
`;
|
| 383 |
-
}
|
| 384 |
-
|
| 385 |
-
// Each solution: code + complexity
|
| 386 |
-
problem.solutions.forEach((sol, i) => {
|
| 387 |
-
html += `
|
| 388 |
-
<div class="card">
|
| 389 |
-
<h2>Solution ${i + 1} <span style="font-size:0.8rem;color:#7f8c8d;font-weight:400;">${escapeHtml(sol.solution_id)}</span></h2>
|
| 390 |
-
<div class="complexity-badges" style="margin-bottom: 15px;">
|
| 391 |
-
`;
|
| 392 |
-
if (sol.time_complexity) {
|
| 393 |
-
html += `
|
| 394 |
-
<div class="complexity-item">
|
| 395 |
-
<span class="complexity-label">Time</span>
|
| 396 |
-
<span class="complexity-value">${escapeHtml(sol.time_complexity)}</span>
|
| 397 |
-
</div>`;
|
| 398 |
-
}
|
| 399 |
-
if (sol.space_complexity) {
|
| 400 |
-
html += `
|
| 401 |
-
<div class="complexity-item">
|
| 402 |
-
<span class="complexity-label">Space</span>
|
| 403 |
-
<span class="complexity-value">${escapeHtml(sol.space_complexity)}</span>
|
| 404 |
-
</div>`;
|
| 405 |
-
}
|
| 406 |
-
html += `
|
| 407 |
-
</div>
|
| 408 |
-
<div class="code-with-tasks">
|
| 409 |
-
${sol.highlighted_code}
|
| 410 |
-
</div>
|
| 411 |
-
</div>
|
| 412 |
-
`;
|
| 413 |
-
});
|
| 414 |
-
|
| 415 |
-
// Navigation hint
|
| 416 |
-
html += `
|
| 417 |
-
<div class="navigation-hint">
|
| 418 |
-
<strong>Tip:</strong> Use the Previous/Next buttons at the top to browse through problems,
|
| 419 |
-
or return to the list view to filter by dataset source or search by name.
|
| 420 |
-
</div>
|
| 421 |
-
`;
|
| 422 |
-
|
| 423 |
-
container.innerHTML = html;
|
| 424 |
-
window.currentProblem = problem;
|
| 425 |
-
return;
|
| 426 |
-
}
|
| 427 |
-
|
| 428 |
-
// Source Code card
|
| 429 |
-
html += `
|
| 430 |
-
<div class="card">
|
| 431 |
-
<h2>Source Code</h2>
|
| 432 |
-
<div class="code-with-tasks" id="code-container">
|
| 433 |
-
${problem.highlighted_code}
|
| 434 |
-
</div>
|
| 435 |
-
</div>
|
| 436 |
-
`;
|
| 437 |
-
|
| 438 |
-
// --- Non-DREval (simple) view ---
|
| 439 |
-
if (!hasTasks) {
|
| 440 |
-
// Show inputs/outputs if available
|
| 441 |
-
if (problem.inputs && problem.inputs.length > 0) {
|
| 442 |
-
html += `<div class="card"><h2>Inputs & Outputs</h2>`;
|
| 443 |
-
problem.inputs.forEach((inp, i) => {
|
| 444 |
-
const out = (problem.outputs && problem.outputs[i]) || '';
|
| 445 |
-
html += `
|
| 446 |
-
<div class="io-section" style="margin-bottom: 15px;">
|
| 447 |
-
<div class="task-section">
|
| 448 |
-
<h3>Input ${i + 1}</h3>
|
| 449 |
-
<pre class="code-block">${escapeHtml(inp)}</pre>
|
| 450 |
-
</div>
|
| 451 |
-
<div class="task-section">
|
| 452 |
-
<h3>Output</h3>
|
| 453 |
-
<pre class="code-block">${escapeHtml(out)}</pre>
|
| 454 |
-
</div>
|
| 455 |
-
</div>
|
| 456 |
-
`;
|
| 457 |
-
});
|
| 458 |
-
html += `</div>`;
|
| 459 |
-
}
|
| 460 |
-
|
| 461 |
-
// Show test suite if available
|
| 462 |
-
if (problem.test) {
|
| 463 |
-
html += `
|
| 464 |
-
<div class="card">
|
| 465 |
-
<h2>Test Suite</h2>
|
| 466 |
-
<pre class="code-block">${escapeHtml(problem.test)}</pre>
|
| 467 |
-
</div>
|
| 468 |
-
`;
|
| 469 |
-
}
|
| 470 |
-
|
| 471 |
-
// Navigation hint
|
| 472 |
-
html += `
|
| 473 |
-
<div class="navigation-hint">
|
| 474 |
-
<strong>Tip:</strong> Use the Previous/Next buttons at the top to browse through problems,
|
| 475 |
-
or return to the list view to filter by dataset source or search by name.
|
| 476 |
-
</div>
|
| 477 |
-
`;
|
| 478 |
-
|
| 479 |
-
container.innerHTML = html;
|
| 480 |
-
window.currentProblem = problem;
|
| 481 |
-
return;
|
| 482 |
-
}
|
| 483 |
-
|
| 484 |
-
// --- CRUXEval task view (tasks have given/predict fields, no task_items) ---
|
| 485 |
-
if (problem.tasks.length > 0 && problem.tasks[0].given !== undefined) {
|
| 486 |
-
// Task selector
|
| 487 |
-
html += `
|
| 488 |
-
<div class="card">
|
| 489 |
-
<h2>Tasks</h2>
|
| 490 |
-
<div class="task-selector" id="task-selector">
|
| 491 |
-
`;
|
| 492 |
-
problem.tasks.forEach((task, idx) => {
|
| 493 |
-
html += `
|
| 494 |
-
<button class="task-btn ${idx === 0 ? 'active' : ''}"
|
| 495 |
-
onclick="showCruxTask(${idx})">
|
| 496 |
-
${escapeHtml(task.name)}
|
| 497 |
-
</button>
|
| 498 |
-
`;
|
| 499 |
-
});
|
| 500 |
-
html += `
|
| 501 |
-
</div>
|
| 502 |
-
<div id="task-content"></div>
|
| 503 |
-
</div>
|
| 504 |
-
`;
|
| 505 |
-
|
| 506 |
-
// Navigation hint
|
| 507 |
-
html += `
|
| 508 |
-
<div class="navigation-hint">
|
| 509 |
-
<strong>Tip:</strong> Use the Previous/Next buttons at the top to browse through problems,
|
| 510 |
-
or return to the list view to filter by dataset source or search by name.
|
| 511 |
-
</div>
|
| 512 |
-
`;
|
| 513 |
-
|
| 514 |
-
container.innerHTML = html;
|
| 515 |
-
window.currentProblem = problem;
|
| 516 |
-
showCruxTask(0);
|
| 517 |
-
return;
|
| 518 |
-
}
|
| 519 |
-
|
| 520 |
-
// --- DREval (full) view with tasks, coverage, arrows ---
|
| 521 |
-
// Rebuild html cleanly with coverage legend and SVG overlay
|
| 522 |
-
html = `
|
| 523 |
-
<div class="card">
|
| 524 |
-
<div class="problem-header">
|
| 525 |
-
<h2>${escapeHtml(problem.entry_point)}</h2>
|
| 526 |
-
<span class="badge ${badgeClass(problem.source)}">${escapeHtml(problem.source)}</span>
|
| 527 |
-
</div>
|
| 528 |
-
<div class="problem-meta">
|
| 529 |
-
<div class="meta-item">
|
| 530 |
-
<span class="meta-label">Task ID:</span>
|
| 531 |
-
<span class="meta-value">${escapeHtml(problem.task_id)}</span>
|
| 532 |
-
</div>
|
| 533 |
-
<div class="meta-item">
|
| 534 |
-
<span class="meta-label">Index:</span>
|
| 535 |
-
<span class="meta-value">${problem.idx}</span>
|
| 536 |
-
</div>
|
| 537 |
-
<div class="meta-item">
|
| 538 |
-
<span class="meta-label">Dataset:</span>
|
| 539 |
-
<span class="meta-value">${escapeHtml(datasetName)}</span>
|
| 540 |
-
</div>
|
| 541 |
-
<div class="meta-item">
|
| 542 |
-
<span class="meta-label">Test Inputs:</span>
|
| 543 |
-
<span class="meta-value">${problem.inputs.length}</span>
|
| 544 |
-
</div>
|
| 545 |
-
</div>
|
| 546 |
-
</div>
|
| 547 |
-
|
| 548 |
-
<div class="card">
|
| 549 |
-
<h2>Source Code</h2>
|
| 550 |
-
<div class="coverage-legend" id="coverage-legend">
|
| 551 |
-
<strong>Coverage:</strong>
|
| 552 |
-
<span class="coverage-legend-item">
|
| 553 |
-
<span class="coverage-swatch" style="background:#d4edda; border:1px solid #28a745;"></span>
|
| 554 |
-
Executed
|
| 555 |
-
</span>
|
| 556 |
-
<span class="coverage-legend-item">
|
| 557 |
-
<span class="coverage-swatch" style="background:#f8d7da; border:1px solid #dc3545;"></span>
|
| 558 |
-
Not executed
|
| 559 |
-
</span>
|
| 560 |
-
</div>
|
| 561 |
-
<div class="code-with-tasks" id="code-container">
|
| 562 |
-
${problem.highlighted_code}
|
| 563 |
-
<svg id="arrow-overlay" xmlns="http://www.w3.org/2000/svg">
|
| 564 |
-
<defs>
|
| 565 |
-
<marker id="arrowhead" markerWidth="8" markerHeight="6"
|
| 566 |
-
refX="8" refY="3" orient="auto">
|
| 567 |
-
<polygon points="0 0, 8 3, 0 6" class="exec-arrow-head"/>
|
| 568 |
-
</marker>
|
| 569 |
-
</defs>
|
| 570 |
-
</svg>
|
| 571 |
-
</div>
|
| 572 |
-
</div>
|
| 573 |
-
`;
|
| 574 |
-
|
| 575 |
-
// Task selector
|
| 576 |
-
html += `
|
| 577 |
-
<div class="card">
|
| 578 |
-
<h2>Test Cases & Tasks</h2>
|
| 579 |
-
<p>Select a test input to view associated reasoning tasks:</p>
|
| 580 |
-
<div class="task-selector" id="task-selector">
|
| 581 |
-
`;
|
| 582 |
-
|
| 583 |
-
problem.tasks.forEach((task, idx) => {
|
| 584 |
-
html += `
|
| 585 |
-
<button class="task-btn ${idx === 0 ? 'active' : ''}"
|
| 586 |
-
onclick="showTask(${idx})">
|
| 587 |
-
Input ${task.input_idx + 1}
|
| 588 |
-
</button>
|
| 589 |
-
`;
|
| 590 |
-
});
|
| 591 |
-
|
| 592 |
-
html += `
|
| 593 |
-
</div>
|
| 594 |
-
<div id="task-content"></div>
|
| 595 |
-
</div>
|
| 596 |
-
`;
|
| 597 |
-
|
| 598 |
-
// Navigation hint
|
| 599 |
-
html += `
|
| 600 |
-
<div class="navigation-hint">
|
| 601 |
-
<strong>Tip:</strong> Use the Previous/Next buttons at the top to browse through problems,
|
| 602 |
-
or return to the list view to filter by dataset source or search by name.
|
| 603 |
-
</div>
|
| 604 |
-
`;
|
| 605 |
-
|
| 606 |
-
container.innerHTML = html;
|
| 607 |
-
|
| 608 |
-
// Store problem data globally
|
| 609 |
-
window.currentProblem = problem;
|
| 610 |
-
|
| 611 |
-
// Show first task by default
|
| 612 |
-
showTask(0);
|
| 613 |
-
}
|
| 614 |
-
|
| 615 |
-
function injectTaskMarkers(taskItems) {
|
| 616 |
-
const codePre = document.querySelector('.source .code pre');
|
| 617 |
-
|
| 618 |
-
// Save the pristine original innerHTML once, before any modification.
|
| 619 |
-
if (codePre && !window._codePreOriginalHtml) {
|
| 620 |
-
window._codePreOriginalHtml = codePre.innerHTML;
|
| 621 |
-
}
|
| 622 |
-
|
| 623 |
-
// Invalidate span cache (rebuilt lazily on next arrow draw)
|
| 624 |
-
window._linenoSpanCache = null;
|
| 625 |
-
|
| 626 |
-
// Store current task items so applyCoverage can re-add markers after wrapping.
|
| 627 |
-
window._currentTaskItems = taskItems || [];
|
| 628 |
-
|
| 629 |
-
// Reset code pre to original, then add markers from scratch.
|
| 630 |
-
if (codePre && window._codePreOriginalHtml) {
|
| 631 |
-
codePre.innerHTML = window._codePreOriginalHtml;
|
| 632 |
-
}
|
| 633 |
-
|
| 634 |
-
if (!taskItems || taskItems.length === 0) {
|
| 635 |
-
return;
|
| 636 |
-
}
|
| 637 |
-
|
| 638 |
-
// Group tasks by line number
|
| 639 |
-
const tasksByLine = {};
|
| 640 |
-
taskItems.forEach(item => {
|
| 641 |
-
if (!tasksByLine[item.lineno]) tasksByLine[item.lineno] = [];
|
| 642 |
-
tasksByLine[item.lineno].push(item.var);
|
| 643 |
-
});
|
| 644 |
-
|
| 645 |
-
// Inject task marker badges into the code pre
|
| 646 |
-
if (!codePre) return;
|
| 647 |
-
const codeLines = codePre.innerHTML.split('\n');
|
| 648 |
-
codePre.innerHTML = codeLines.map((line, idx) => {
|
| 649 |
-
const lineNum = idx + 1;
|
| 650 |
-
if (tasksByLine[lineNum] && line.trim() !== '') {
|
| 651 |
-
const vars = tasksByLine[lineNum];
|
| 652 |
-
return line + `<span class="task-marker" data-lineno="${lineNum}" data-vars="${escapeHtml(vars.join(', '))}">${escapeHtml(vars.join(', '))}</span>`;
|
| 653 |
-
}
|
| 654 |
-
return line;
|
| 655 |
-
}).join('\n');
|
| 656 |
-
|
| 657 |
-
}
|
| 658 |
-
|
| 659 |
-
function applyCoverage(coverageSet, totalLines) {
|
| 660 |
-
// Remove previous coverage classes from lineno spans.
|
| 661 |
-
// Pygments structure: td.linenos > div.linenodiv > pre > span.normal
|
| 662 |
-
// These are individual elements — adding/removing classes has no layout impact.
|
| 663 |
-
document.querySelectorAll('td.linenos .normal').forEach(el => {
|
| 664 |
-
el.classList.remove('line-executed', 'line-not-executed');
|
| 665 |
-
});
|
| 666 |
-
|
| 667 |
-
if (!coverageSet) {
|
| 668 |
-
const legend = document.getElementById('coverage-legend');
|
| 669 |
-
if (legend) legend.style.display = 'none';
|
| 670 |
-
return;
|
| 671 |
-
}
|
| 672 |
-
|
| 673 |
-
const legend = document.getElementById('coverage-legend');
|
| 674 |
-
if (legend) legend.style.display = 'block';
|
| 675 |
-
|
| 676 |
-
// Color lineno spans only. We never touch codePre.innerHTML here so:
|
| 677 |
-
// 1. The table layout is never disturbed (no alignment issue).
|
| 678 |
-
// 2. Task markers injected by injectTaskMarkers are left untouched.
|
| 679 |
-
document.querySelectorAll('td.linenos .normal').forEach(span => {
|
| 680 |
-
const lineNum = parseInt(span.textContent.trim());
|
| 681 |
-
if (!isNaN(lineNum) && lineNum <= totalLines) {
|
| 682 |
-
span.classList.add(coverageSet.has(lineNum) ? 'line-executed' : 'line-not-executed');
|
| 683 |
-
}
|
| 684 |
-
});
|
| 685 |
-
}
|
| 686 |
-
|
| 687 |
-
// Global map: lineno -> list of next line numbers (1-indexed; -1 = end of trace)
|
| 688 |
-
window._nextLinesMap = {};
|
| 689 |
-
|
| 690 |
-
async function loadAndApplyGroundTruth(problemIdx, inputIdx, taskItems) {
|
| 691 |
-
// Show "loading" placeholders on all task items
|
| 692 |
-
taskItems.forEach(item => {
|
| 693 |
-
const el = document.getElementById(`gt-${item.lineno}-${item.var}`);
|
| 694 |
-
if (el) { el.textContent = '…'; el.className = 'gt-answer loading'; }
|
| 695 |
-
});
|
| 696 |
-
|
| 697 |
-
// Clear next-lines data from previous input
|
| 698 |
-
window._nextLinesMap = {};
|
| 699 |
-
|
| 700 |
-
try {
|
| 701 |
-
const resp = await fetch(`/api/${datasetSlug}/problem/${problemIdx}/ground_truth/${inputIdx}`);
|
| 702 |
-
const gt = await resp.json();
|
| 703 |
-
|
| 704 |
-
if (gt.status !== 'ok') {
|
| 705 |
-
taskItems.forEach(item => {
|
| 706 |
-
const el = document.getElementById(`gt-${item.lineno}-${item.var}`);
|
| 707 |
-
if (el) { el.textContent = gt.status === 'error' ? '(exec error)' : '(unavailable)'; el.className = 'gt-answer'; }
|
| 708 |
-
});
|
| 709 |
-
applyCoverage(null, 0);
|
| 710 |
-
return;
|
| 711 |
-
}
|
| 712 |
-
|
| 713 |
-
// Apply coverage highlighting
|
| 714 |
-
const coverageSet = new Set(gt.coverage);
|
| 715 |
-
applyCoverage(coverageSet, gt.total_lines);
|
| 716 |
-
|
| 717 |
-
// Fill in variable answers
|
| 718 |
-
const answerMap = {};
|
| 719 |
-
gt.variable_answers.forEach(a => {
|
| 720 |
-
answerMap[`${a.lineno}-${a.var}`] = a.answer_str;
|
| 721 |
-
});
|
| 722 |
-
taskItems.forEach(item => {
|
| 723 |
-
const el = document.getElementById(`gt-${item.lineno}-${item.var}`);
|
| 724 |
-
if (el) {
|
| 725 |
-
const answer = answerMap[`${item.lineno}-${item.var}`] || '(not available)';
|
| 726 |
-
el.textContent = answer;
|
| 727 |
-
el.className = 'gt-answer';
|
| 728 |
-
}
|
| 729 |
-
});
|
| 730 |
-
|
| 731 |
-
// Store next-lines data for arrow visualization
|
| 732 |
-
if (gt.next_lines_answers) {
|
| 733 |
-
gt.next_lines_answers.forEach(a => {
|
| 734 |
-
window._nextLinesMap[a.lineno] = a.next_lines;
|
| 735 |
-
});
|
| 736 |
-
}
|
| 737 |
-
|
| 738 |
-
// Attach hover handlers to task-marker spans now that we have next-lines data
|
| 739 |
-
attachArrowHoverHandlers();
|
| 740 |
-
|
| 741 |
-
} catch (e) {
|
| 742 |
-
taskItems.forEach(item => {
|
| 743 |
-
const el = document.getElementById(`gt-${item.lineno}-${item.var}`);
|
| 744 |
-
if (el) { el.textContent = '(error)'; el.className = 'gt-answer'; }
|
| 745 |
-
});
|
| 746 |
-
}
|
| 747 |
-
}
|
| 748 |
-
|
| 749 |
-
// Cache of lineNum → DOM span, rebuilt whenever injectTaskMarkers runs.
|
| 750 |
-
window._linenoSpanCache = null;
|
| 751 |
-
|
| 752 |
-
function buildLinenoSpanCache(container) {
|
| 753 |
-
const cache = {};
|
| 754 |
-
container.querySelectorAll('td.linenos .normal').forEach(span => {
|
| 755 |
-
const n = parseInt(span.textContent.trim());
|
| 756 |
-
if (!isNaN(n)) cache[n] = span;
|
| 757 |
-
});
|
| 758 |
-
window._linenoSpanCache = cache;
|
| 759 |
-
}
|
| 760 |
-
|
| 761 |
-
/**
|
| 762 |
-
* Get the bounding rect of the lineno span for a given 1-indexed line number,
|
| 763 |
-
* relative to the code container element. Uses a cached span map.
|
| 764 |
-
*/
|
| 765 |
-
function getLinenoSpanRect(lineNum, container) {
|
| 766 |
-
if (!window._linenoSpanCache) buildLinenoSpanCache(container);
|
| 767 |
-
const span = window._linenoSpanCache[lineNum];
|
| 768 |
-
if (!span) return null;
|
| 769 |
-
const spanRect = span.getBoundingClientRect();
|
| 770 |
-
const containerRect = container.getBoundingClientRect();
|
| 771 |
-
return {
|
| 772 |
-
top: spanRect.top - containerRect.top + container.scrollTop,
|
| 773 |
-
bottom: spanRect.bottom - containerRect.top + container.scrollTop,
|
| 774 |
-
left: spanRect.left - containerRect.left,
|
| 775 |
-
right: spanRect.right - containerRect.left,
|
| 776 |
-
width: spanRect.width,
|
| 777 |
-
height: spanRect.height,
|
| 778 |
-
midY: (spanRect.top + spanRect.bottom) / 2 - containerRect.top + container.scrollTop,
|
| 779 |
-
};
|
| 780 |
-
}
|
| 781 |
-
|
| 782 |
-
/**
|
| 783 |
-
* Draw arrows from sourceLine to each of the targetLines in the SVG overlay.
|
| 784 |
-
* Lines are 1-indexed. -1 means "end of execution" (no arrow drawn).
|
| 785 |
-
*/
|
| 786 |
-
function drawArrows(sourceLineNum, targetLineNums) {
|
| 787 |
-
const container = document.getElementById('code-container');
|
| 788 |
-
const svg = document.getElementById('arrow-overlay');
|
| 789 |
-
if (!container || !svg) return;
|
| 790 |
-
|
| 791 |
-
// Remove previous arrows (but keep defs)
|
| 792 |
-
svg.querySelectorAll('.arrow-path').forEach(el => el.remove());
|
| 793 |
-
|
| 794 |
-
const srcRect = getLinenoSpanRect(sourceLineNum, container);
|
| 795 |
-
if (!srcRect) return;
|
| 796 |
-
|
| 797 |
-
// Update SVG height to match container
|
| 798 |
-
svg.setAttribute('height', container.scrollHeight);
|
| 799 |
-
|
| 800 |
-
targetLineNums.forEach(targetLineNum => {
|
| 801 |
-
if (targetLineNum === -1) return; // end of trace — no arrow
|
| 802 |
-
|
| 803 |
-
const dstRect = getLinenoSpanRect(targetLineNum, container);
|
| 804 |
-
if (!dstRect) return;
|
| 805 |
-
|
| 806 |
-
// Start point: right edge of source lineno span, vertically centered
|
| 807 |
-
const x1 = srcRect.right + 2;
|
| 808 |
-
const y1 = srcRect.midY;
|
| 809 |
-
|
| 810 |
-
// End point: right edge of target lineno span, vertically centered
|
| 811 |
-
const x2 = dstRect.right + 2;
|
| 812 |
-
const y2 = dstRect.midY;
|
| 813 |
-
|
| 814 |
-
// Horizontal offset for the bezier control points — curves to the right
|
| 815 |
-
const curveOffset = Math.max(30, Math.abs(y2 - y1) * 0.4);
|
| 816 |
-
|
| 817 |
-
// Cubic bezier: both control points extend to the right of the lineno column
|
| 818 |
-
const cx1 = x1 + curveOffset;
|
| 819 |
-
const cy1 = y1;
|
| 820 |
-
const cx2 = x2 + curveOffset;
|
| 821 |
-
const cy2 = y2;
|
| 822 |
-
|
| 823 |
-
const path = document.createElementNS('http://www.w3.org/2000/svg', 'path');
|
| 824 |
-
path.setAttribute('d', `M ${x1} ${y1} C ${cx1} ${cy1}, ${cx2} ${cy2}, ${x2} ${y2}`);
|
| 825 |
-
path.setAttribute('class', 'exec-arrow arrow-path');
|
| 826 |
-
path.setAttribute('marker-end', 'url(#arrowhead)');
|
| 827 |
-
svg.appendChild(path);
|
| 828 |
-
});
|
| 829 |
-
}
|
| 830 |
-
|
| 831 |
-
/**
|
| 832 |
-
* Clear all arrows from the SVG overlay.
|
| 833 |
-
*/
|
| 834 |
-
function clearArrows() {
|
| 835 |
-
const svg = document.getElementById('arrow-overlay');
|
| 836 |
-
if (svg) {
|
| 837 |
-
svg.querySelectorAll('.arrow-path').forEach(el => el.remove());
|
| 838 |
-
}
|
| 839 |
-
}
|
| 840 |
-
|
| 841 |
-
// AbortController for the current set of marker hover listeners.
|
| 842 |
-
let _markerListenersAbort = null;
|
| 843 |
-
|
| 844 |
-
/**
|
| 845 |
-
* Attach mouseenter/mouseleave handlers to all .task-marker spans so that
|
| 846 |
-
* hovering shows execution-flow arrows to next lines.
|
| 847 |
-
*/
|
| 848 |
-
function attachArrowHoverHandlers() {
|
| 849 |
-
// Cancel any previously attached listeners without touching the DOM.
|
| 850 |
-
if (_markerListenersAbort) _markerListenersAbort.abort();
|
| 851 |
-
_markerListenersAbort = new AbortController();
|
| 852 |
-
const { signal } = _markerListenersAbort;
|
| 853 |
-
|
| 854 |
-
document.querySelectorAll('.task-marker').forEach(marker => {
|
| 855 |
-
marker.addEventListener('mouseenter', () => {
|
| 856 |
-
const lineNum = parseInt(marker.dataset.lineno);
|
| 857 |
-
if (!lineNum) return;
|
| 858 |
-
const nextLines = window._nextLinesMap[lineNum];
|
| 859 |
-
if (nextLines && nextLines.length > 0) {
|
| 860 |
-
drawArrows(lineNum, nextLines);
|
| 861 |
-
}
|
| 862 |
-
}, { signal });
|
| 863 |
-
|
| 864 |
-
marker.addEventListener('mouseleave', () => {
|
| 865 |
-
clearArrows();
|
| 866 |
-
}, { signal });
|
| 867 |
-
});
|
| 868 |
-
}
|
| 869 |
-
|
| 870 |
-
function showCruxTask(taskIdx) {
|
| 871 |
-
const problem = window.currentProblem;
|
| 872 |
-
const task = problem.tasks[taskIdx];
|
| 873 |
-
|
| 874 |
-
// Update active button
|
| 875 |
-
document.querySelectorAll('.task-btn').forEach((btn, idx) => {
|
| 876 |
-
btn.classList.toggle('active', idx === taskIdx);
|
| 877 |
-
});
|
| 878 |
-
|
| 879 |
-
const givenLabel = task.given === 'input' ? 'Input (given)' : 'Output (given)';
|
| 880 |
-
const predictLabel = task.predict === 'output' ? 'Output (predict)' : 'Input (predict)';
|
| 881 |
-
const givenValue = task.given === 'input' ? task.input : task.output;
|
| 882 |
-
const predictValue = task.predict === 'output' ? task.output : task.input;
|
| 883 |
-
|
| 884 |
-
const html = `
|
| 885 |
-
<div class="task-details">
|
| 886 |
-
<div class="task-section">
|
| 887 |
-
<p style="margin-bottom: 12px; color: #7f8c8d;">${escapeHtml(task.description)}</p>
|
| 888 |
-
</div>
|
| 889 |
-
<div class="io-section">
|
| 890 |
-
<div class="task-section">
|
| 891 |
-
<h3>${escapeHtml(givenLabel)}</h3>
|
| 892 |
-
<pre class="code-block">${escapeHtml(givenValue)}</pre>
|
| 893 |
-
</div>
|
| 894 |
-
<div class="task-section">
|
| 895 |
-
<h3>${escapeHtml(predictLabel)}</h3>
|
| 896 |
-
<pre class="code-block crux-answer">${escapeHtml(predictValue)}</pre>
|
| 897 |
-
</div>
|
| 898 |
-
</div>
|
| 899 |
-
</div>
|
| 900 |
-
`;
|
| 901 |
-
|
| 902 |
-
document.getElementById('task-content').innerHTML = html;
|
| 903 |
-
}
|
| 904 |
-
|
| 905 |
-
function showTask(taskIdx) {
|
| 906 |
-
const problem = window.currentProblem;
|
| 907 |
-
const task = problem.tasks[taskIdx];
|
| 908 |
-
|
| 909 |
-
// Update active button
|
| 910 |
-
const buttons = document.querySelectorAll('.task-btn');
|
| 911 |
-
buttons.forEach((btn, idx) => {
|
| 912 |
-
if (idx === taskIdx) {
|
| 913 |
-
btn.classList.add('active');
|
| 914 |
-
} else {
|
| 915 |
-
btn.classList.remove('active');
|
| 916 |
-
}
|
| 917 |
-
});
|
| 918 |
-
|
| 919 |
-
// Inject task markers into the code
|
| 920 |
-
injectTaskMarkers(task.task_items);
|
| 921 |
-
|
| 922 |
-
// Clear previous coverage while new one loads
|
| 923 |
-
applyCoverage(null, 0);
|
| 924 |
-
|
| 925 |
-
// Render task content
|
| 926 |
-
// For HumanEval: Input + Expected Output side by side.
|
| 927 |
-
// For ClassEval: Input alone (side by side layout), then Test Class below full-width.
|
| 928 |
-
const ioSection = task.test_class_code
|
| 929 |
-
? `<div class="io-section">
|
| 930 |
-
<div class="task-section">
|
| 931 |
-
<h3>Input</h3>
|
| 932 |
-
<pre class="code-block">${escapeHtml(task.input)}</pre>
|
| 933 |
-
</div>
|
| 934 |
-
</div>
|
| 935 |
-
<div class="task-section">
|
| 936 |
-
<h3>Test Class — <code>${escapeHtml(task.test_class_name)}</code></h3>
|
| 937 |
-
<pre class="code-block">${escapeHtml(task.test_class_code)}</pre>
|
| 938 |
-
</div>`
|
| 939 |
-
: `<div class="io-section">
|
| 940 |
-
<div class="task-section">
|
| 941 |
-
<h3>Input</h3>
|
| 942 |
-
<pre class="code-block">${escapeHtml(task.input)}</pre>
|
| 943 |
-
</div>
|
| 944 |
-
<div class="task-section">
|
| 945 |
-
<h3>Expected Output</h3>
|
| 946 |
-
<pre class="code-block">${escapeHtml(task.output)}</pre>
|
| 947 |
-
</div>
|
| 948 |
-
</div>`;
|
| 949 |
-
|
| 950 |
-
let html = `
|
| 951 |
-
<div class="task-details">
|
| 952 |
-
${ioSection}
|
| 953 |
-
`;
|
| 954 |
-
|
| 955 |
-
// Show task items with ground truth answer placeholders
|
| 956 |
-
if (task.task_items && task.task_items.length > 0) {
|
| 957 |
-
html += `
|
| 958 |
-
<div class="task-section">
|
| 959 |
-
<h3>Reasoning Tasks</h3>
|
| 960 |
-
<p style="margin-bottom: 10px; color: #7f8c8d;">
|
| 961 |
-
Variable state at each execution point (correct answer shown in
|
| 962 |
-
<span style="background:#17a2b8;color:white;padding:1px 6px;border-radius:3px;font-size:0.82rem;">teal</span>):
|
| 963 |
-
</p>
|
| 964 |
-
<ul class="task-items-list">
|
| 965 |
-
`;
|
| 966 |
-
|
| 967 |
-
task.task_items.forEach(item => {
|
| 968 |
-
html += `
|
| 969 |
-
<li>
|
| 970 |
-
<span class="line-ref">Line ${item.lineno}</span>
|
| 971 |
-
<span class="var-name">${escapeHtml(item.var)}</span>
|
| 972 |
-
<span class="gt-answer loading" id="gt-${item.lineno}-${item.var}">…</span>
|
| 973 |
-
</li>
|
| 974 |
-
`;
|
| 975 |
-
});
|
| 976 |
-
|
| 977 |
-
html += `
|
| 978 |
-
</ul>
|
| 979 |
-
</div>
|
| 980 |
-
`;
|
| 981 |
-
}
|
| 982 |
-
|
| 983 |
-
// Show output prediction task if exists
|
| 984 |
-
if (task.output_pred) {
|
| 985 |
-
html += `
|
| 986 |
-
<div class="task-section">
|
| 987 |
-
<h3>Output Completion Task</h3>
|
| 988 |
-
<p style="margin-bottom: 10px; color: #7f8c8d;">
|
| 989 |
-
The model needs to complete this test assertion:
|
| 990 |
-
</p>
|
| 991 |
-
<pre class="code-block">${escapeHtml(task.output_pred)}</pre>
|
| 992 |
-
</div>
|
| 993 |
-
`;
|
| 994 |
-
}
|
| 995 |
-
|
| 996 |
-
html += `</div>`;
|
| 997 |
-
|
| 998 |
-
document.getElementById('task-content').innerHTML = html;
|
| 999 |
-
|
| 1000 |
-
// Fetch and apply ground truth (coverage + variable answers)
|
| 1001 |
-
if (hasGroundTruth && task.task_items) {
|
| 1002 |
-
loadAndApplyGroundTruth(problem.idx, task.input_idx, task.task_items);
|
| 1003 |
-
}
|
| 1004 |
-
}
|
| 1005 |
-
|
| 1006 |
-
function escapeHtml(text) {
|
| 1007 |
-
if (text === null || text === undefined) return '';
|
| 1008 |
-
const div = document.createElement('div');
|
| 1009 |
-
div.textContent = text;
|
| 1010 |
-
return div.innerHTML;
|
| 1011 |
-
}
|
| 1012 |
-
|
| 1013 |
-
loadProblem();
|
| 1014 |
</script>
|
|
|
|
| 1015 |
{% endblock %}
|
|
|
|
| 20 |
{% endblock %}
|
| 21 |
|
| 22 |
{% block extra_css %}
|
| 23 |
+
{{ css|safe }}
|
| 24 |
+
{% endblock %}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
+
{% block extra_head %}
|
| 27 |
+
<link rel="stylesheet" href="{{ url_for('static', filename='problem.css') }}">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
{% endblock %}
|
| 29 |
|
| 30 |
{% block content %}
|
|
|
|
| 43 |
const datasetName = {{ dataset_name|tojson }};
|
| 44 |
const hasGroundTruth = {{ has_ground_truth|tojson }};
|
| 45 |
const hasTasks = {{ has_tasks|tojson }};
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
</script>
|
| 47 |
+
<script src="{{ url_for('static', filename='problem.js') }}"></script>
|
| 48 |
{% endblock %}
|