Spaces:
Running
Running
| # NanoCodeRAG / NanoCodeRAGProgrammingSolutions | |
| ## Overview | |
| CodeRAG-Bench uses programming-solution documents, including HumanEval- and | |
| MBPP-style canonical problems, as direct retrieval support for code generation. | |
| This Nano split keeps the prompt-to-solution view: a short natural-language | |
| Python programming request must retrieve the compact function that implements | |
| it. The sampled positives are short snippets for list manipulation, tuple | |
| sorting, monotonic-array checks, divisor sums, and similar operations, so the | |
| retriever must map intent to executable code even when the prompt and code | |
| share little surface wording. | |
| ## Details | |
| ### What the Original Data Measures | |
| [CodeRAG-Bench: Can Retrieval Augment Code Generation?](https://arxiv.org/abs/2406.14497) | |
| uses programming solutions as one of its five retrieval sources. The paper says | |
| these documents are created from basic programming problems with canonical | |
| solutions, such as HumanEval and MBPP, by concatenating the natural-language | |
| problem and program solution. In CodeRAG-Bench, such canonical documents can act | |
| as direct support for generation and as a retrieval-evaluation target. | |
| This Nano split isolates the programming-solution source: each query describes a | |
| small Python task, and the positive document is the code solution. Unlike | |
| documentation or tutorial retrieval, the positive may be only a few lines of code | |
| and may not repeat the query's descriptive words. | |
| ### Observed Data Profile | |
| The Nano split has 200 queries, 984 documents, and 200 positive qrel rows. Every | |
| query has one positive. Queries average 78.28 characters and are mostly prompts | |
| such as "Write a python function to check whether the given array is monotonic" | |
| or "find the sum of common divisors". Documents average only 189.05 characters | |
| and are compact Python functions. | |
| The sampled positives include one-line or short-loop implementations for list | |
| manipulation, tuple sorting, monotonic-array checks, divisor sums, and string | |
| editing. Some HumanEval-style queries in the observed sample are truncated to | |
| imports such as `from typing import List`, which makes the retrieval task much | |
| harder because the visible query contains almost no semantic problem statement. | |
| ### BM25 Difficulty | |
| Using the dataset-provided BM25 candidate column, BM25 reaches nDCG@10 = 0.0138 | |
| and hit@10 = 0.0250. It ranks only one positive first and finds only five | |
| positives in the top 10. This is the hardest NanoCodeRAG split for BM25 by a | |
| wide margin. | |
| The failure mode is structural. A natural-language prompt such as "sum of common | |
| divisors" must retrieve code using `%`, loops, and accumulator variables; lexical | |
| overlap is minimal. For some HumanEval items, the query text is only an import, | |
| so BM25 retrieves unrelated snippets sharing imports or common Python tokens. | |
| A useful retriever needs NL-to-code semantic matching and should recognize | |
| algorithmic behavior, not only shared words. | |
| ### Training Data That May Help | |
| Useful training data includes non-overlapping HumanEval, MBPP, APPS, CodeContests, | |
| and CodeSearchNet-style natural-language-to-code pairs, plus execution-verified | |
| code solutions with hard negatives from similar tasks. Training should exclude | |
| the NanoCodeRAG programming-solution evaluation queries, qrels, and positive | |
| solution snippets. | |
| Training should preserve identifiers, control flow, and algorithmic behavior. | |
| Pairs that include tests or input-output examples are useful because they teach | |
| retrievers that two prompts with similar wording can require different code. | |
| ### Synthetic Data Guidance | |
| For document-to-question generation, use non-evaluation Python functions and | |
| generate natural programming prompts that describe the function's behavior, | |
| inputs, and expected output. Include edge cases and examples when they are | |
| grounded in the code. | |
| For joint generation, create small executable Python functions plus prompts that | |
| ask for exactly that behavior. Hard negatives should solve nearby tasks with | |
| similar words but different conditions, such as sum vs product, first vs last, | |
| or increasing vs decreasing. Do not use Nano evaluation prompts or solution | |
| snippets as seeds. | |
| ## Example Data | |
| | Query | Positive document | | |
| | --- | --- | | |
| | # Write a python function to check whether the given array is monotonic or not. (79 chars) | def is_Monotonic(A): return (all(A[i] <= A[i + 1] for i in range(len(A) - 1)) or all(A[i] >= A[i + 1] for i in range(len(A) - 1))) (149 chars) | | |
| | # Write a python function to find the sum of common divisors of two given numbers. (82 chars) | def sum(a,b): sum = 0 for i in range (1,min(a,b)): if (a % i == 0 and b % i == 0): sum += i return sum (143 chars) | | |
| | # Write a function to add the given list to the given tuples. (61 chars) | def add_lists(test_list, test_tup): res = tuple(list(test_tup) + test_list) return (res) (94 chars) | | |
| | # Write a function to extract the index minimum value record from the given tuples. (83 chars) | from operator import itemgetter def index_minimum(test_list): res = min(test_list, key = itemgetter(1))[0] return (res) (127 chars) | | |
| | # Write a python function to check whether the sum of divisors are same or not. (79 chars) | import math def divSum(n): sum = 1; i = 2; while(i * i <= n): if (n % i == 0): sum = (sum + i +math.floor(n / i)); i += 1; return sum; def areEquivalent(num1,num2): return divSum(num1) == divSum(num2); (269 chars) | | |
| ## Dataset Information | |
| | Field | Value | | |
| | --- | --- | | |
| | Nano set | NanoCodeRAG | | |
| | Backing dataset | NanoCodeRAG | | |
| | Task / split | NanoCodeRAGProgrammingSolutions | | |
| | Hugging Face dataset | [hakari-bench/NanoCodeRAG](https://huggingface.co/datasets/hakari-bench/NanoCodeRAG) | | |
| | Language | en | | |
| | Category | code | | |
| | Queries | 200 | | |
| | Documents | 984 | | |
| | Positive qrels | 200 | | |
| | BM25 nDCG@10 | 0.0138 | | |
| | BM25 hit@10 | 0.0250 | | |
| | Query length avg chars | 78.28 | | |
| | Document length avg chars | 189.05 | | |
| ### Public Sources | |
| - [CodeRAG-Bench: Can Retrieval Augment Code Generation?](https://arxiv.org/abs/2406.14497); 2025; Zora Zhiruo Wang, Akari Asai, Xinyan Velocity Yu, Frank F. Xu, Yiqing Xie, Graham Neubig, and Daniel Fried; DOI: `10.18653/v1/2025.findings-naacl.176`. | |
| - [CodeRAG-Bench project page](https://code-rag-bench.github.io/). | |
| - [CodeRAG-Bench GitHub repository](https://github.com/code-rag-bench/code-rag-bench). | |
| - [code-rag-bench/programming-solutions dataset card](https://huggingface.co/datasets/code-rag-bench/programming-solutions). | |
| ### Hugging Face Links | |
| - Nano dataset: [hakari-bench/NanoCodeRAG](https://huggingface.co/datasets/hakari-bench/NanoCodeRAG) | |
| - Source dataset: [code-rag-bench/programming-solutions](https://huggingface.co/datasets/code-rag-bench/programming-solutions) | |
| ### Source Reference Table | |
| | Title | Year | Type | URL | | |
| | --- | ---: | --- | --- | | |
| | CodeRAG-Bench: Can Retrieval Augment Code Generation? | 2025 | arXiv paper | https://arxiv.org/abs/2406.14497 | | |
| | CodeRAG-Bench project page | 2025 | project page | https://code-rag-bench.github.io/ | | |
| | code-rag-bench/programming-solutions | 2024 | dataset card | https://huggingface.co/datasets/code-rag-bench/programming-solutions | | |
| ## Machine-Readable Metadata | |
| <!-- benchmark-task-metadata:v1 --> | |
| ```yaml | |
| benchmark_task_metadata: | |
| schema_version: 1 | |
| document_status: first_pass | |
| nano_set: NanoCodeRAG | |
| backing_dataset: NanoCodeRAG | |
| dataset_id: hakari-bench/NanoCodeRAG | |
| task_name: NanoCodeRAGProgrammingSolutions | |
| split_name: NanoCodeRAGProgrammingSolutions | |
| language: en | |
| category: code | |
| document_path: docs/benchmark_tasks/NanoCodeRAG/NanoCodeRAGProgrammingSolutions.md | |
| source_research: | |
| primary_source_type: benchmark_paper | |
| paper_pdf_or_html_checked: true | |
| paper_url: https://arxiv.org/abs/2406.14497 | |
| additional_source_urls: | |
| - https://aclanthology.org/2025.findings-naacl.176/ | |
| - https://code-rag-bench.github.io/ | |
| - https://github.com/code-rag-bench/code-rag-bench | |
| - https://huggingface.co/datasets/code-rag-bench/programming-solutions | |
| counts: | |
| queries: 200 | |
| documents: 984 | |
| positive_qrels: 200 | |
| positives_per_query: | |
| average: 1.0 | |
| min: 1 | |
| median: 1.0 | |
| max: 1 | |
| multi_positive_queries: 0 | |
| multi_positive_query_percent: 0.0 | |
| text_stats_chars: | |
| query_mean: 78.28 | |
| document_mean: 189.053862 | |
| bm25: | |
| ndcg_at_10: 0.0137666396 | |
| hit_at_10: 0.025 | |
| source: dataset_bm25_column | |
| learning: | |
| original_train_split: unknown | |
| evaluation_split_origin: CodeRAG-Bench programming solutions retrieval source sampled into NanoCodeRAG | |
| train_eval_overlap_audit: not_audited | |
| leakage_note: exclude NanoCodeRAG programming prompts, qrels, and positive solution snippets | |
| useful_training_data: | |
| - non-overlapping HumanEval and MBPP style prompt-to-code pairs | |
| - APPS and CodeContests natural-language-to-code solutions | |
| - CodeSearchNet summary-to-code retrieval pairs | |
| - execution-verified Python functions with behaviorally similar hard negatives | |
| synthetic_data: | |
| document_generation: small executable Python functions with identifiers, control flow, and edge-case behavior | |
| question_generation: natural programming prompts describing inputs, outputs, constraints, and examples | |
| answerability: the solution code should implement exactly the behavior requested by the prompt | |
| multi_positive_training: single_positive_question_document_focus | |
| links: | |
| nano_dataset: https://huggingface.co/datasets/hakari-bench/NanoCodeRAG | |
| source_urls: | |
| - label: CodeRAG-Bench arXiv | |
| url: https://arxiv.org/abs/2406.14497 | |
| - label: CodeRAG-Bench project page | |
| url: https://code-rag-bench.github.io/ | |
| - label: CodeRAG-Bench GitHub | |
| url: https://github.com/code-rag-bench/code-rag-bench | |
| - label: code-rag-bench/programming-solutions | |
| url: https://huggingface.co/datasets/code-rag-bench/programming-solutions | |
| source_notes: [] | |
| references: | |
| - title: "CodeRAG-Bench: Can Retrieval Augment Code Generation?" | |
| url: https://arxiv.org/abs/2406.14497 | |
| year: 2025 | |
| doi: 10.18653/v1/2025.findings-naacl.176 | |
| is_paper: true | |
| source_confidence: definitive_paper_link | |
| ``` | |