Spaces:
Running
Running
| # NanoCodeRAG / NanoCodeRAGLibraryDocumentationSolutions | |
| ## Overview | |
| CodeRAG-Bench studies whether retrieval can support code generation, and its | |
| library-documentation source is built from official Python library references | |
| collected through devdocs.io. This Nano task uses API names or short reference | |
| descriptions as queries and retrieves documentation entries, often TensorFlow | |
| pages. The observed records include signatures, aliases, arguments, examples, | |
| and migration notes, so the task asks whether a retriever can find the exact | |
| reference page that would ground API-aware generation. | |
| ## Details | |
| ### What the Original Data Measures | |
| [CodeRAG-Bench: Can Retrieval Augment Code Generation?](https://arxiv.org/abs/2406.14497) | |
| introduces a retrieval-augmented code generation benchmark with a heterogeneous | |
| retrieval datastore. The paper reports five retrieval sources: programming | |
| solutions, online tutorials, Python library documentation, Stack Overflow posts, | |
| and GitHub files. For library documentation, it collects official documentation | |
| provided by devdocs.io for Python libraries, which is especially intended to help | |
| open-domain and repository-level programming tasks that require library-specific | |
| functions. | |
| The same paper manually annotates canonical documents for code-generation tasks | |
| and evaluates retrieval with NDCG@10, precision, and recall. It also finds that | |
| current retrievers still struggle when useful contexts have limited lexical | |
| overlap. In this Nano split, the retrieval surface is the documentation source | |
| itself: the correct document is the API documentation entry associated with the | |
| query. | |
| ### Observed Data Profile | |
| The Nano split has 200 queries, 8,683 documents, and 200 positive qrel rows. | |
| Every query has one positive. Queries average 397.43 characters, but the median | |
| is only 110 characters; the long tail comes from API entries whose query text | |
| includes unusually long reference material. Documents average 2,045.70 | |
| characters, with some very long documentation pages. | |
| The sampled data is dominated by TensorFlow-style API documentation: function or | |
| class names such as `tf.autodiff.ForwardAccumulator`, | |
| `tf.compat.v1.confusion_matrix`, and `tf.compat.v1.batch_to_space_nd`, followed | |
| by a short description and alias notes. The relevant documents contain method | |
| signatures, argument descriptions, examples, deprecation warnings, and migration | |
| guidance. | |
| ### BM25 Difficulty | |
| Using the dataset-provided BM25 candidate column, BM25 reaches nDCG@10 = 0.2279 | |
| and hit@10 = 0.3800. BM25 ranks 19 positives first and finds 76 positives in the | |
| top 10. This is a difficult lexical retrieval task because many documents repeat | |
| generic documentation phrases such as "View aliases", "Compat aliases", and | |
| "Migration guide", while the meaningful disambiguator may be a dotted API path. | |
| Observed failures include TensorFlow AutoGraph and audio APIs where BM25 ranks | |
| unrelated Keras constraint or optimizer documentation above the positive. A | |
| strong retriever must preserve exact API identifiers and namespace structure, | |
| while also using semantic clues from the short API summary. | |
| ### Training Data That May Help | |
| Useful training data includes non-overlapping Python API documentation retrieval, | |
| DocPrompting-style natural-language intent to documentation pairs, API search | |
| logs, docstring-to-reference retrieval, and library-specific examples paired | |
| with the reference page that explains them. Training should exclude the | |
| CodeRAG-Bench library-documentation evaluation queries, qrels, and positive | |
| documentation entries used by this Nano split. | |
| Models should be trained to keep identifiers intact: dotted module paths, | |
| function names, argument names, and versioned aliases are often the decisive | |
| tokens. Generic documentation boilerplate should be treated as weak evidence. | |
| ### Synthetic Data Guidance | |
| For document-to-question generation, use non-evaluation API reference pages and | |
| generate short programming questions, API-name lookups, and usage-intent queries | |
| that are answerable from the selected documentation. Preserve signatures, | |
| argument names, return types, warnings, and version-specific notes. | |
| For joint generation, create realistic library documentation entries and | |
| developer queries that ask how to use or locate an API. Hard negatives should be | |
| nearby APIs in the same namespace or functions with similar boilerplate but | |
| different behavior. Do not seed synthetic data with Nano evaluation queries or | |
| positive documentation entries. | |
| ## Example Data | |
| | Query | Positive document | | |
| | --- | --- | | |
| | tf.autodiff.ForwardAccumulator Computes Jacobian-vector products ("JVP"s) using forward-mode autodiff. (102 chars) | tf.autodiff.ForwardAccumulator( primals, tangents ) Compare to tf.GradientTape which computes vector-Jacobian products ("VJP"s) using reverse-mode autodiff (backprop). Reverse mode is more attractive when computing gradients ... [truncated 225 chars](6087 chars) | | |
| | tf.compat.v1.data.experimental.RandomDataset A Dataset of pseudorandom values. Inherits From: Dataset, Dataset (110 chars) | tf.compat.v1.data.experimental.RandomDataset( seed=None ) Attributes element_spec The type specification of an element of this dataset. dataset = tf.data.Dataset.from_tensor_slices([1, 2, 3]) dataset.element_spec TensorSpec(s ... [truncated 225 chars](55309 chars) | | |
| | tf.compat.v1.confusion_matrix Computes the confusion matrix from predictions and labels. View aliases Compat aliases for migration (132 chars) | See Migration guide for more details. tf.compat.v1.math.confusion_matrix tf.compat.v1.confusion_matrix( labels, predictions, num_classes=None, dtype=tf.dtypes.int32, name=None, weights=None ) The matrix columns represent the ... [truncated 225 chars](1943 chars) | | |
| | tf.compat.v1.batch_to_space_nd BatchToSpace for N-D tensors of type T. View aliases Compat aliases for migration (114 chars) | See Migration guide for more details. tf.compat.v1.manip.batch_to_space_nd tf.compat.v1.batch_to_space_nd( input, block_shape, crops, name=None ) This operation reshapes the "batch" dimension 0 into M + 1 dimensions of shape ... [truncated 225 chars](3558 chars) | | |
| | tf.compat.v1.distribute.OneDeviceStrategy A distribution strategy for running on a single device. Inherits From: Strategy (121 chars) | tf.compat.v1.distribute.OneDeviceStrategy( device ) Using this strategy will place any variables created in its scope on the specified device. Input distributed through this strategy will be prefetched to the specified device ... [truncated 225 chars](30793 chars) | | |
| ## Dataset Information | |
| | Field | Value | | |
| | --- | --- | | |
| | Nano set | NanoCodeRAG | | |
| | Backing dataset | NanoCodeRAG | | |
| | Task / split | NanoCodeRAGLibraryDocumentationSolutions | | |
| | Hugging Face dataset | [hakari-bench/NanoCodeRAG](https://huggingface.co/datasets/hakari-bench/NanoCodeRAG) | | |
| | Language | en | | |
| | Category | code | | |
| | Queries | 200 | | |
| | Documents | 8,683 | | |
| | Positive qrels | 200 | | |
| | BM25 nDCG@10 | 0.2279 | | |
| | BM25 hit@10 | 0.3800 | | |
| | Query length avg chars | 397.43 | | |
| | Document length avg chars | 2,045.70 | | |
| ### Public Sources | |
| - [CodeRAG-Bench: Can Retrieval Augment Code Generation?](https://arxiv.org/abs/2406.14497); 2025; Zora Zhiruo Wang, Akari Asai, Xinyan Velocity Yu, Frank F. Xu, Yiqing Xie, Graham Neubig, and Daniel Fried; DOI: `10.18653/v1/2025.findings-naacl.176`. | |
| - [CodeRAG-Bench project page](https://code-rag-bench.github.io/). | |
| - [CodeRAG-Bench GitHub repository](https://github.com/code-rag-bench/code-rag-bench). | |
| - [code-rag-bench/library-documentation dataset card](https://huggingface.co/datasets/code-rag-bench/library-documentation). | |
| ### Hugging Face Links | |
| - Nano dataset: [hakari-bench/NanoCodeRAG](https://huggingface.co/datasets/hakari-bench/NanoCodeRAG) | |
| - Source dataset: [code-rag-bench/library-documentation](https://huggingface.co/datasets/code-rag-bench/library-documentation) | |
| ### Source Reference Table | |
| | Title | Year | Type | URL | | |
| | --- | ---: | --- | --- | | |
| | CodeRAG-Bench: Can Retrieval Augment Code Generation? | 2025 | arXiv paper | https://arxiv.org/abs/2406.14497 | | |
| | CodeRAG-Bench project page | 2025 | project page | https://code-rag-bench.github.io/ | | |
| | code-rag-bench/library-documentation | 2024 | dataset card | https://huggingface.co/datasets/code-rag-bench/library-documentation | | |
| ## Machine-Readable Metadata | |
| <!-- benchmark-task-metadata:v1 --> | |
| ```yaml | |
| benchmark_task_metadata: | |
| schema_version: 1 | |
| document_status: first_pass | |
| nano_set: NanoCodeRAG | |
| backing_dataset: NanoCodeRAG | |
| dataset_id: hakari-bench/NanoCodeRAG | |
| task_name: NanoCodeRAGLibraryDocumentationSolutions | |
| split_name: NanoCodeRAGLibraryDocumentationSolutions | |
| language: en | |
| category: code | |
| document_path: docs/benchmark_tasks/NanoCodeRAG/NanoCodeRAGLibraryDocumentationSolutions.md | |
| source_research: | |
| primary_source_type: benchmark_paper | |
| paper_pdf_or_html_checked: true | |
| paper_url: https://arxiv.org/abs/2406.14497 | |
| additional_source_urls: | |
| - https://aclanthology.org/2025.findings-naacl.176/ | |
| - https://code-rag-bench.github.io/ | |
| - https://github.com/code-rag-bench/code-rag-bench | |
| - https://huggingface.co/datasets/code-rag-bench/library-documentation | |
| counts: | |
| queries: 200 | |
| documents: 8683 | |
| positive_qrels: 200 | |
| positives_per_query: | |
| average: 1.0 | |
| min: 1 | |
| median: 1.0 | |
| max: 1 | |
| multi_positive_queries: 0 | |
| multi_positive_query_percent: 0.0 | |
| text_stats_chars: | |
| query_mean: 397.43 | |
| document_mean: 2045.703098 | |
| bm25: | |
| ndcg_at_10: 0.227871825 | |
| hit_at_10: 0.38 | |
| source: dataset_bm25_column | |
| learning: | |
| original_train_split: unknown | |
| evaluation_split_origin: CodeRAG-Bench library documentation retrieval source sampled into NanoCodeRAG | |
| train_eval_overlap_audit: not_audited | |
| leakage_note: exclude NanoCodeRAG library-documentation queries, qrels, and positive documentation entries | |
| useful_training_data: | |
| - non-overlapping Python API documentation retrieval pairs | |
| - DocPrompting-style natural-language intent to documentation pairs | |
| - docstring and example code to reference-page retrieval | |
| - library search logs and API usage examples with overlap removed | |
| synthetic_data: | |
| document_generation: realistic Python API documentation with signatures, parameters, examples, aliases, and version notes | |
| question_generation: API-name, usage-intent, and troubleshooting queries grounded in those documentation entries | |
| answerability: the selected document should contain the exact API behavior, signature, or argument needed by the query | |
| multi_positive_training: single_positive_question_document_focus | |
| links: | |
| nano_dataset: https://huggingface.co/datasets/hakari-bench/NanoCodeRAG | |
| source_urls: | |
| - label: CodeRAG-Bench arXiv | |
| url: https://arxiv.org/abs/2406.14497 | |
| - label: CodeRAG-Bench project page | |
| url: https://code-rag-bench.github.io/ | |
| - label: CodeRAG-Bench GitHub | |
| url: https://github.com/code-rag-bench/code-rag-bench | |
| - label: code-rag-bench/library-documentation | |
| url: https://huggingface.co/datasets/code-rag-bench/library-documentation | |
| source_notes: [] | |
| references: | |
| - title: "CodeRAG-Bench: Can Retrieval Augment Code Generation?" | |
| url: https://arxiv.org/abs/2406.14497 | |
| year: 2025 | |
| doi: 10.18653/v1/2025.findings-naacl.176 | |
| is_paper: true | |
| source_confidence: definitive_paper_link | |
| ``` | |