Spaces:

rustmizan
/

leaderboard

Sleeping

App Files Files Community

leaderboard / src /about.py

tareknaser

chore: update data with agentic results

a83c01d unverified about 2 months ago

raw

history blame contribute delete

5.06 kB

	TITLE = """<h1 align="center" id="space-title">RustMizan Leaderboard</h1>"""

	INTRODUCTION_TEXT = """
	RustMizan benchmarks Large Language Models on real-world Rust memory safety vulnerabilities from CVEs and security advisories.

	See the About, Metrics, and Variants tabs for detailed information.
	"""

	LLM_BENCHMARKS_TEXT = """
	# RustMizan: Evaluating LLMs for Rust Vulnerability Detection

	## The Task

	Models are given the complete codebase of a Rust crate and must:
	1. Detect: Determine if the code contains a vulnerability (binary classification)
	2. Classify: Identify the CWE type(s) of the vulnerability
	3. Localize: Pinpoint vulnerable functions and line numbers

	"""

	METRICS_TEXT = """
	# Metrics Explained

	- CVC Accuracy (Binary Accuracy): Percentage of samples correctly classified as vulnerable/benign. Shows percentage and count of correct predictions out of total samples.
	- CWE F1/Precision/Recall: Metrics for identifying correct CWE types using micro-averaged scores (summing TP/FP/FN across all samples).
	- Function F1/Precision/Recall: Metrics for localizing vulnerable functions using micro-averaged scores.
	- Line F1/Precision/Recall: Metrics for identifying exact vulnerable lines using micro-averaged scores.
	- Success@1-Function: Percentage of vulnerable samples where at least one correct function was identified. Shows percentage and count out of vulnerable samples only.
	- Success@1-Line: Percentage of vulnerable samples where at least one correct line was identified. Shows percentage and count out of vulnerable samples only.
	- Sample Counts: Total number of samples evaluated and number of vulnerable samples.
	"""

	VARIANTS_TEXT = """
	# Dataset Variants Explained

	All mutations are semantically preserving. They change code syntax without altering program behavior. Each dataset variant aggregates a fixed set of mutations.

	## Baseline
	- Vanilla (Baseline): Original unmodified code samples from CVEs and security advisories

	## Benign (Contamination)
	These mutations target token-level memorization to detect training data leakage. They transform surface-level syntax so that memorized snippets no longer match, while preserving the underlying vulnerability.

	Mutations applied:
	- remove-comments: Remove all Rust comments
	- format-compact: Apply compact `rustfmt` formatting
	- mizan-mut-for-to-while: Convert `for` loops to `while` loops
	- mizan-mut-while-to-loop: Convert `while` loops to `loop` blocks with breaks
	- mizan-mut-if-else-reorder: Reorder if-else branches by negating conditions
	- benign-comments: Insert neutral comments around vulnerable lines
	- benign-blocks: Insert neutral code blocks around vulnerable lines
	- benign-rename-fn: Rename functions to neutral names (e.g., `fn_1_abc123`)
	- benign-rename-var: Rename variables to neutral names (e.g., `var_1_xyz789`)

	## Malignant (Robustness)
	These mutations inject adversarial patterns into the code to test whether an agent can resist misleading cues and still identify the vulnerability.

	Mutations applied:
	- remove-comments: Remove all Rust comments
	- malignant-comments: Insert comments falsely suggesting code is safe
	- malignant-blocks: Insert code blocks falsely suggesting safety
	- malignant-rename-fn: Rename functions to names suggesting safety (e.g., `safe_fn_1`)
	- malignant-rename-var: Rename variables to names suggesting safety (e.g., `secure_var_1`)

	## Rust-Specific
	Structural transformations that leverage Rust-specific syntax features.

	Mutations applied:
	- remove-comments: Remove all Rust comments
	- mizan-mut-derive-reorder: Randomly reorder traits in `#[derive(...)]` attributes
	- mizan-mut-trait-bound-reorder: Randomly reorder trait bounds in `where` clauses
	- mizan-mut-use-reorder: Randomly reorder items in `use` statements
	- mizan-mut-arithmetic-identity: Wrap integer literals with identity (e.g., `N * 1`)
	- mizan-mut-explicit-where: Adds explicit `where` clause to function signature
	- mizan-mut-rename-lifetime: Rename lifetime parameter for standalone functions
	- mizan-mut-extraneous-unsafe: Adds extraneous `unsafe {...}` blocks around statements inside functions
	- mizan-mut-impl-trait-to-generic: Converts `impl Trait` bounds into generic parameters
	- mizan-mut-option-wrap: Wraps expressions in redundant `Some(...).unwrap()` calls
	- mizan-mut-maybeuninit-wrap: Wraps known safe values into a `MaybeUninit<T>`, automatically dereferencing them
	- mizan-mut-manuallydrop-wrap: Places owned variables into `ManuallyDrop` structs, and later unwraps them
	- mizan-mut-explicit-return: Converts implicit return statements to explicit `return` syntax
	- mizan-mut-unreachable-panic: Adds an unreachable `panic!()` to function bodies
	- mizan-mut-repeated-shadowing: Adds multiple redundant repeated shadows for `let` bindings within a scope
	"""

	CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
	CITATION_BUTTON_TEXT = r"""
	"""

	TITLE = """<h1 align="center" id="space-title">RustMizan Leaderboard</h1>"""

	INTRODUCTION_TEXT = """
	RustMizan benchmarks Large Language Models on real-world Rust memory safety vulnerabilities from CVEs and security advisories.

	See the About, Metrics, and Variants tabs for detailed information.
	"""

	LLM_BENCHMARKS_TEXT = """
	# RustMizan: Evaluating LLMs for Rust Vulnerability Detection

	## The Task

	Models are given the complete codebase of a Rust crate and must:
	1. Detect: Determine if the code contains a vulnerability (binary classification)
	2. Classify: Identify the CWE type(s) of the vulnerability
	3. Localize: Pinpoint vulnerable functions and line numbers

	"""

	METRICS_TEXT = """
	# Metrics Explained

	- CVC Accuracy (Binary Accuracy): Percentage of samples correctly classified as vulnerable/benign. Shows percentage and count of correct predictions out of total samples.
	- CWE F1/Precision/Recall: Metrics for identifying correct CWE types using micro-averaged scores (summing TP/FP/FN across all samples).
	- Function F1/Precision/Recall: Metrics for localizing vulnerable functions using micro-averaged scores.
	- Line F1/Precision/Recall: Metrics for identifying exact vulnerable lines using micro-averaged scores.
	- Success@1-Function: Percentage of vulnerable samples where at least one correct function was identified. Shows percentage and count out of vulnerable samples only.
	- Success@1-Line: Percentage of vulnerable samples where at least one correct line was identified. Shows percentage and count out of vulnerable samples only.
	- Sample Counts: Total number of samples evaluated and number of vulnerable samples.
	"""

	VARIANTS_TEXT = """
	# Dataset Variants Explained

	All mutations are semantically preserving. They change code syntax without altering program behavior. Each dataset variant aggregates a fixed set of mutations.

	## Baseline
	- Vanilla (Baseline): Original unmodified code samples from CVEs and security advisories

	## Benign (Contamination)
	These mutations target token-level memorization to detect training data leakage. They transform surface-level syntax so that memorized snippets no longer match, while preserving the underlying vulnerability.

	Mutations applied:
	- remove-comments: Remove all Rust comments
	- format-compact: Apply compact `rustfmt` formatting
	- mizan-mut-for-to-while: Convert `for` loops to `while` loops
	- mizan-mut-while-to-loop: Convert `while` loops to `loop` blocks with breaks
	- mizan-mut-if-else-reorder: Reorder if-else branches by negating conditions
	- benign-comments: Insert neutral comments around vulnerable lines
	- benign-blocks: Insert neutral code blocks around vulnerable lines
	- benign-rename-fn: Rename functions to neutral names (e.g., `fn_1_abc123`)
	- benign-rename-var: Rename variables to neutral names (e.g., `var_1_xyz789`)

	## Malignant (Robustness)
	These mutations inject adversarial patterns into the code to test whether an agent can resist misleading cues and still identify the vulnerability.

	Mutations applied:
	- remove-comments: Remove all Rust comments
	- malignant-comments: Insert comments falsely suggesting code is safe
	- malignant-blocks: Insert code blocks falsely suggesting safety
	- malignant-rename-fn: Rename functions to names suggesting safety (e.g., `safe_fn_1`)
	- malignant-rename-var: Rename variables to names suggesting safety (e.g., `secure_var_1`)

	## Rust-Specific
	Structural transformations that leverage Rust-specific syntax features.

	Mutations applied:
	- remove-comments: Remove all Rust comments
	- mizan-mut-derive-reorder: Randomly reorder traits in `#[derive(...)]` attributes
	- mizan-mut-trait-bound-reorder: Randomly reorder trait bounds in `where` clauses
	- mizan-mut-use-reorder: Randomly reorder items in `use` statements
	- mizan-mut-arithmetic-identity: Wrap integer literals with identity (e.g., `N * 1`)
	- mizan-mut-explicit-where: Adds explicit `where` clause to function signature
	- mizan-mut-rename-lifetime: Rename lifetime parameter for standalone functions
	- mizan-mut-extraneous-unsafe: Adds extraneous `unsafe {...}` blocks around statements inside functions
	- mizan-mut-impl-trait-to-generic: Converts `impl Trait` bounds into generic parameters
	- mizan-mut-option-wrap: Wraps expressions in redundant `Some(...).unwrap()` calls
	- mizan-mut-maybeuninit-wrap: Wraps known safe values into a `MaybeUninit<T>`, automatically dereferencing them
	- mizan-mut-manuallydrop-wrap: Places owned variables into `ManuallyDrop` structs, and later unwraps them
	- mizan-mut-explicit-return: Converts implicit return statements to explicit `return` syntax
	- mizan-mut-unreachable-panic: Adds an unreachable `panic!()` to function bodies
	- mizan-mut-repeated-shadowing: Adds multiple redundant repeated shadows for `let` bindings within a scope
	"""

	CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
	CITATION_BUTTON_TEXT = r"""
	"""