Spaces:
Sleeping
Sleeping
| TITLE = """<h1 align="center" id="space-title">RustMizan Leaderboard</h1>""" | |
| INTRODUCTION_TEXT = """ | |
| RustMizan benchmarks Large Language Models on real-world Rust memory safety vulnerabilities from CVEs and security advisories. | |
| **See the About, Metrics, and Variants tabs for detailed information.** | |
| """ | |
| LLM_BENCHMARKS_TEXT = """ | |
| # RustMizan: Evaluating LLMs for Rust Vulnerability Detection | |
| ## The Task | |
| Models are given the complete codebase of a Rust crate and must: | |
| 1. **Detect**: Determine if the code contains a vulnerability (binary classification) | |
| 2. **Classify**: Identify the CWE type(s) of the vulnerability | |
| 3. **Localize**: Pinpoint vulnerable functions and line numbers | |
| """ | |
| METRICS_TEXT = """ | |
| # Metrics Explained | |
| - **CVC Accuracy (Binary Accuracy)**: Percentage of samples correctly classified as vulnerable/benign. Shows percentage and count of correct predictions out of total samples. | |
| - **CWE F1/Precision/Recall**: Metrics for identifying correct CWE types using micro-averaged scores (summing TP/FP/FN across all samples). | |
| - **Function F1/Precision/Recall**: Metrics for localizing vulnerable functions using micro-averaged scores. | |
| - **Line F1/Precision/Recall**: Metrics for identifying exact vulnerable lines using micro-averaged scores. | |
| - **Success@1-Function**: Percentage of vulnerable samples where at least one correct function was identified. Shows percentage and count out of vulnerable samples only. | |
| - **Success@1-Line**: Percentage of vulnerable samples where at least one correct line was identified. Shows percentage and count out of vulnerable samples only. | |
| - **Sample Counts**: Total number of samples evaluated and number of vulnerable samples. | |
| """ | |
| VARIANTS_TEXT = """ | |
| # Dataset Variants Explained | |
| All mutations are **semantically preserving**. They change code syntax without altering program behavior. Each dataset variant aggregates a fixed set of mutations. | |
| ## Baseline | |
| - **Vanilla (Baseline)**: Original unmodified code samples from CVEs and security advisories | |
| ## Benign (Contamination) | |
| These mutations target token-level memorization to detect training data leakage. They transform surface-level syntax so that memorized snippets no longer match, while preserving the underlying vulnerability. | |
| Mutations applied: | |
| - **remove-comments**: Remove all Rust comments | |
| - **format-compact**: Apply compact `rustfmt` formatting | |
| - **mizan-mut-for-to-while**: Convert `for` loops to `while` loops | |
| - **mizan-mut-while-to-loop**: Convert `while` loops to `loop` blocks with breaks | |
| - **mizan-mut-if-else-reorder**: Reorder if-else branches by negating conditions | |
| - **benign-comments**: Insert neutral comments around vulnerable lines | |
| - **benign-blocks**: Insert neutral code blocks around vulnerable lines | |
| - **benign-rename-fn**: Rename functions to neutral names (e.g., `fn_1_abc123`) | |
| - **benign-rename-var**: Rename variables to neutral names (e.g., `var_1_xyz789`) | |
| ## Malignant (Robustness) | |
| These mutations inject adversarial patterns into the code to test whether an agent can resist misleading cues and still identify the vulnerability. | |
| Mutations applied: | |
| - **remove-comments**: Remove all Rust comments | |
| - **malignant-comments**: Insert comments falsely suggesting code is safe | |
| - **malignant-blocks**: Insert code blocks falsely suggesting safety | |
| - **malignant-rename-fn**: Rename functions to names suggesting safety (e.g., `safe_fn_1`) | |
| - **malignant-rename-var**: Rename variables to names suggesting safety (e.g., `secure_var_1`) | |
| ## Rust-Specific | |
| Structural transformations that leverage Rust-specific syntax features. | |
| Mutations applied: | |
| - **remove-comments**: Remove all Rust comments | |
| - **mizan-mut-derive-reorder**: Randomly reorder traits in `#[derive(...)]` attributes | |
| - **mizan-mut-trait-bound-reorder**: Randomly reorder trait bounds in `where` clauses | |
| - **mizan-mut-use-reorder**: Randomly reorder items in `use` statements | |
| - **mizan-mut-arithmetic-identity**: Wrap integer literals with identity (e.g., `N * 1`) | |
| - **mizan-mut-explicit-where**: Adds explicit `where` clause to function signature | |
| - **mizan-mut-rename-lifetime**: Rename lifetime parameter for standalone functions | |
| - **mizan-mut-extraneous-unsafe**: Adds extraneous `unsafe {...}` blocks around statements inside functions | |
| - **mizan-mut-impl-trait-to-generic**: Converts `impl Trait` bounds into generic parameters | |
| - **mizan-mut-option-wrap**: Wraps expressions in redundant `Some(...).unwrap()` calls | |
| - **mizan-mut-maybeuninit-wrap**: Wraps known safe values into a `MaybeUninit<T>`, automatically dereferencing them | |
| - **mizan-mut-manuallydrop-wrap**: Places owned variables into `ManuallyDrop` structs, and later unwraps them | |
| - **mizan-mut-explicit-return**: Converts implicit return statements to explicit `return` syntax | |
| - **mizan-mut-unreachable-panic**: Adds an unreachable `panic!()` to function bodies | |
| - **mizan-mut-repeated-shadowing**: Adds multiple redundant repeated shadows for `let` bindings within a scope | |
| """ | |
| CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results" | |
| CITATION_BUTTON_TEXT = r""" | |
| """ | |