Buckets:

SavantCapital
/

datasets

Files

xet

SavantCapital/datasets / README.md

SavantCapital

11 days ago

preview code

download

raw

6.65 kB

	---
	license: apache-2.0

	authors:
	- Vinayak Patel

	task_categories:
	- question-answering
	- text-generation

	language:
	- en

	size_categories:
	- 10M<n<100M

	tags:
	- llm-benchmark
	- frontier-models
	- reasoning
	- recursive-reasoning
	- long-context
	- benchmark
	- evaluation
	- prompt-based-evaluation
	- encrypted-reasoning
	- recursive-execution
	- dependency-resolution
	- noisy-reasoning

	pretty_name: Vinayak Multistep Recursive Reasoning Benchmark

	configs:
	- config_name: questions
	data_files:
	- split: test
	path: questionSheet.txt

	- config_name: answers
	data_files:
	- split: test
	path: AnswerSheet.csv

	---

	# Vinayak Multistep Recursive Reasoning Benchmark (VMRRB)

	## Overview

	The Vinayak Multistep Recursive Reasoning Benchmark (VMRRB) is a large-scale prompt-based benchmark designed to evaluate advanced reasoning, recursive dependency resolution, encrypted task traversal, and robustness capabilities of frontier AI systems.

	The benchmark evaluates a model's ability to:
	- Perform recursive multistep reasoning
	- Resolve interdependent question chains
	- Execute encrypted dependency traversal
	- Perform chained decryption using computed answers
	- Parse noisy mathematical expressions
	- Maintain consistency across extremely long contexts
	- Handle recursive execution workflows
	- Preserve memory across large contextual inputs

	The benchmark contains approximately 10 million interdependent reasoning tasks.

	Unlike conventional QA datasets, VMRRB operates as a continuous evaluation prompt in which:
	- Questions are encrypted
	- Dependencies must be recursively resolved
	- Intermediate answers become future decryption keys
	- Noise must be filtered semantically
	- Questions must be solved in dependency order

	This benchmark is intended exclusively for evaluation and benchmarking of frontier AI systems.

	There is no training split provided.

	---

	## Files

	- questionSheet.txt
	- AnswerSheet.csv

	All files belong to the `test` split.

	---

	## Dataset Structure

	### questions config

	The `questions` configuration contains `questionSheet.txt`, which is a monolithic long-context evaluation prompt.

	The file contains:
	1. Benchmark execution instructions
	2. Recursive dependency rules
	3. Encrypted question traversal logic
	4. Noise-handling constraints
	5. Mathematical parsing rules
	6. Decryption code
	7. The encrypted benchmark question corpus

	The benchmark operates as a recursive execution chain.

	Questions may contain references such as:

	```text
	[ Answer N ]
	```

	which indicates that:
	- Question `N` must first be solved
	- Its computed answer must be substituted back into the current question
	- Dependencies must be resolved recursively

	Additionally:
	- Questions are encrypted
	- Computed answers are used as decryption keys
	- Decrypted outputs may still contain semantic noise
	- Models must identify valid mathematical structure while ignoring irrelevant text

	The benchmark is designed to be processed as a single continuous context window whenever possible.

	Arbitrary chunking may invalidate dependency chains and decryption traversal logic.

	---

	### answers config

	The `answers` configuration contains `AnswerSheet.csv`, which stores the benchmark answer key.

	Schema:

	\| Column \| Description \|
	\|---\|---\|
	\| `Question Number` \| Sequential benchmark question identifier \|
	\| `Answer` \| Final computed numeric answer \|

	Example:

	```csv
	Question Number,Answer
	1,807.184
	2,6.148
	3,400.891
	```

	Question numbering corresponds to the ordering of benchmark questions within `questionSheet.txt`.

	---

	## Benchmark Workflow

	The intended benchmark workflow is:

	1. Start from the initial question specified in the "Start Guidance" section
	2. Decrypt the question using the provided password or dependency answer
	3. Remove semantic noise from the decrypted output
	4. Resolve referenced dependencies recursively
	5. Compute the final answer
	6. Use the computed answer as the decryption key for the next question
	7. Continue recursively until traversal is complete

	The benchmark evaluates both reasoning correctness and recursive execution consistency.

	---

	## Reproducibility

	The benchmark includes the complete decryption logic and execution rules required for deterministic evaluation.

	---

	## Important Notes

	- This is not a conventional row-wise supervised dataset.
	- The benchmark is designed as a prompt-based recursive reasoning corpus.
	- Dependency chains must be resolved recursively before computing answers.
	- Intermediate answers may function as future decryption keys.
	- Decrypted questions may contain semantic noise that must be ignored.
	- Full-context loading is strongly recommended.
	- Arbitrary chunking may invalidate recursive traversal logic and dependency resolution.
	- Questions are not intended to be solved independently.

	---

	## Resource Requirements

	`questionSheet.txt` is intentionally large and designed for long-context evaluation.

	Substantial memory, compute, and context-window capacity may be required for full benchmark execution.

	---

	## Warning

	VMRRB is intentionally designed to stress recursive execution, long-context reasoning, encrypted traversal, and dependency resolution capabilities.

	The benchmark may require substantial compute, memory, and context-window capacity for full evaluation.

	---

	## Example Usage

	```python
	with open("questionSheet.txt", "r", encoding="utf-8") as f:
	benchmark_prompt = f.read()

	# Feed the complete benchmark prompt into the evaluation model
	response = model.generate(benchmark_prompt)
	```

	Answers can be validated against `AnswerSheet.csv` using the corresponding `Question Number` field.

	---

	## Intended Usage

	- Long-context evaluation
	- Recursive dependency reasoning evaluation
	- Encrypted reasoning evaluation
	- Dependency traversal benchmarking
	- Memory and dependency tracking
	- Frontier AI evaluation
	- Retrieval-aware reasoning systems
	- Recursive execution evaluation
	- Robustness testing under chained reasoning tasks

	---

	## Limitations

	This dataset is intended solely for benchmarking and evaluation.

	It is not intended as:
	- A supervised training corpus
	- A standard row-wise machine learning dataset
	- A retrieval benchmark
	- A conventional mathematical QA dataset

	---

	## Citation

	If you use VMRRB in research or evaluation pipelines, please cite the dataset repository.

	---

	## Author

	Vinayak Patel

	---

	## License

	Apache License 2.0

Xet Storage Details

Size:: 6.65 kB
Xet hash:: e222c979609711eeeb9d0be486e8aa50993e5a2df00dc00579d77368c94b8e55

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.