Spaces:
Running
Running
| # NanoCodeRAG / NanoCodeRAGStackoverflowPosts | |
| ## Overview | |
| CodeRAG-Bench treats Stack Overflow posts as developer knowledge that can | |
| augment code generation, using question-answer posts from the StackExchange | |
| portion of RedPajama-1T as retrievable documents. In this Nano split, a | |
| programming question, usually beginning with a title and short problem | |
| description, must retrieve a long post containing answers, code examples, | |
| caveats, and discussion. The observed topics include Photoshop automation, | |
| locked files in C#, concurrent database editing, MySQL triggers, and IIS | |
| bandwidth behavior, so relevance is a practical fix or design answer rather | |
| than a generic topic match. | |
| ## Details | |
| ### What the Original Data Measures | |
| [CodeRAG-Bench: Can Retrieval Augment Code Generation?](https://arxiv.org/abs/2406.14497) | |
| collects Stack Overflow posts as one of its five developer retrieval sources, | |
| using the StackExchange split of RedPajama-1T. The paper treats each post as a | |
| retrievable document with a question, code responses, and textual explanations. | |
| Its open retrieval analysis reports that Stack Overflow posts can improve | |
| general programming generation, because retrieved posts may contain the same | |
| programming problem, code, and detailed explanations. | |
| This Nano split focuses on retrieving those community Q&A documents directly. | |
| The source is less formal than documentation: posts include multiple answers, | |
| partial fixes, warnings, tool recommendations, and conversational text. | |
| ### Observed Data Profile | |
| The Nano split has 200 queries, 10,000 documents, and 200 positive qrel rows. | |
| Every query has one positive. Queries average 209.84 characters and often begin | |
| with `Q:` followed by a title and problem details. Documents average 4,735.05 | |
| characters and may contain several answers, code examples, caveats, and links. | |
| The sampled queries include Mac font lookup from Photoshop automation, deleting | |
| a locked file in C#, concurrent database editing, MySQL trigger errors, and IIS | |
| bandwidth throttling. The positives are practical answer threads, not polished | |
| reference pages, so relevance may depend on matching the exact error condition | |
| or development environment. | |
| ### BM25 Difficulty | |
| Using the dataset-provided BM25 candidate column, BM25 reaches nDCG@10 = 0.6902 | |
| and hit@10 = 0.7950. It ranks 115 positives first and finds 159 positives in the | |
| top 10. Lexical matching is often strong because the query text is copied from | |
| the post's question and the answer thread repeats product names, languages, and | |
| error phrases. | |
| The misses are usually near-neighbor Q&A failures. For TortoiseSVN branching, | |
| BM25 retrieves another version-control discussion before the positive. For SQL | |
| Server's equivalent of MySQL `REPLACE INTO`, it retrieves a database hosting | |
| cost discussion. The retriever must distinguish the requested operation, | |
| platform, and tool from other posts with similar technology words. | |
| ### Training Data That May Help | |
| Useful training data includes non-overlapping Stack Overflow question-to-answer | |
| thread retrieval, duplicate-question retrieval, issue-to-fix pairs, API usage | |
| Q&A, and documentation-linked Q&A. Training should exclude the NanoCodeRAG Stack | |
| Overflow evaluation queries, qrels, and positive posts. | |
| Community Q&A data benefits from negatives that share tags or error messages but | |
| answer a different problem. Models should learn to use both the question title | |
| and body, because the title alone may be too broad. | |
| ### Synthetic Data Guidance | |
| For document-to-question generation, use non-evaluation Q&A posts and generate | |
| developer questions that preserve the language, framework, error, environment, | |
| and desired operation. The selected post should contain a usable answer, warning, | |
| or workaround. | |
| For joint generation, create realistic Stack Overflow-style threads with a | |
| question, accepted answer, alternative answers, code snippets, and caveats. Hard | |
| negatives should share the same tags or tool names but solve a different | |
| failure mode. Do not use Nano evaluation queries or positive posts as seeds. | |
| ## Example Data | |
| | Query | Positive document | | |
| | --- | --- | | |
| | Q: How can I find the full path to a font from its display name on a Mac? I am using the Photoshop's javascript API to find the fonts in a given PSD. (149 chars) | Given a font name returned by the API, I want to find the actual physical font file that font name corresponds to on the disc. This is all happening in a python program running on OSX so I guess I'm looking for one of: * *Som ... [truncated 225 chars](5076 chars) | | |
| | Q: How do I delete a file which is locked by another process in C#? I'm looking for a way to delete a file which is locked by another process using C#. I suspect the method must be able to find which process is locking the fi ... [truncated 225 chars](396 chars) | A: If you want to do it programmatically. I'm not sure... and I'd really recommend against it. If you're just troubleshooting stuff on your own machine, SysInternals Process Explorer can help you Run it, use the Find Handle c ... [truncated 225 chars](13199 chars) | | |
| | Q: Editing database records by multiple users I have designed database tables (normalised, on an MS SQL server) and created a standalone windows front end for an application that will be used by a handful of users to add and ... [truncated 225 chars](334 chars) | I am concerned that if two users start editing the same record then the last to commit the update would be the 'winner' and important information may be lost. A number of solutions come to mind but I'm not sure if I am going ... [truncated 225 chars](4026 chars) | | |
| | Q: Throw an error preventing a table update in a MySQL trigger If I have a trigger before the update on a table, how can I throw an error that prevents the update on that table? (177 chars) | A: CREATE TRIGGER sample_trigger_msg BEFORE INSERT FOR EACH ROW BEGIN IF(NEW.important_value) < (1*2) THEN DECLARE dummy INT; SELECT Enter your Message Here!!! INTO dummy FROM mytable WHERE mytable.id=new.id END IF; END; A: H ... [truncated 225 chars](5314 chars) | | |
| | Q: Bandwith throttling in IIS 6 by IP Address I am writing an application that downloads large files in the background. All clients are logged in locally, or through a VPN. When they are logged in locally, I do not want to th ... [truncated 225 chars](391 chars) | Since this is an AIR Application, I figure I will throttle via server-side since I can do it from either the server itself (IIS 6) or the web service (asp.net / C#). Throttling through IIS 6 seems to work fine, but it seems l ... [truncated 225 chars](922 chars) | | |
| ## Dataset Information | |
| | Field | Value | | |
| | --- | --- | | |
| | Nano set | NanoCodeRAG | | |
| | Backing dataset | NanoCodeRAG | | |
| | Task / split | NanoCodeRAGStackoverflowPosts | | |
| | Hugging Face dataset | [hakari-bench/NanoCodeRAG](https://huggingface.co/datasets/hakari-bench/NanoCodeRAG) | | |
| | Language | en | | |
| | Category | code | | |
| | Queries | 200 | | |
| | Documents | 10,000 | | |
| | Positive qrels | 200 | | |
| | BM25 nDCG@10 | 0.6902 | | |
| | BM25 hit@10 | 0.7950 | | |
| | Query length avg chars | 209.84 | | |
| | Document length avg chars | 4,735.05 | | |
| ### Public Sources | |
| - [CodeRAG-Bench: Can Retrieval Augment Code Generation?](https://arxiv.org/abs/2406.14497); 2025; Zora Zhiruo Wang, Akari Asai, Xinyan Velocity Yu, Frank F. Xu, Yiqing Xie, Graham Neubig, and Daniel Fried; DOI: `10.18653/v1/2025.findings-naacl.176`. | |
| - [CodeRAG-Bench project page](https://code-rag-bench.github.io/). | |
| - [CodeRAG-Bench GitHub repository](https://github.com/code-rag-bench/code-rag-bench). | |
| - [code-rag-bench/stackoverflow-posts dataset card](https://huggingface.co/datasets/code-rag-bench/stackoverflow-posts). | |
| ### Hugging Face Links | |
| - Nano dataset: [hakari-bench/NanoCodeRAG](https://huggingface.co/datasets/hakari-bench/NanoCodeRAG) | |
| - Source dataset: [code-rag-bench/stackoverflow-posts](https://huggingface.co/datasets/code-rag-bench/stackoverflow-posts) | |
| ### Source Reference Table | |
| | Title | Year | Type | URL | | |
| | --- | ---: | --- | --- | | |
| | CodeRAG-Bench: Can Retrieval Augment Code Generation? | 2025 | arXiv paper | https://arxiv.org/abs/2406.14497 | | |
| | CodeRAG-Bench project page | 2025 | project page | https://code-rag-bench.github.io/ | | |
| | code-rag-bench/stackoverflow-posts | 2024 | dataset card | https://huggingface.co/datasets/code-rag-bench/stackoverflow-posts | | |
| ## Machine-Readable Metadata | |
| <!-- benchmark-task-metadata:v1 --> | |
| ```yaml | |
| benchmark_task_metadata: | |
| schema_version: 1 | |
| document_status: first_pass | |
| nano_set: NanoCodeRAG | |
| backing_dataset: NanoCodeRAG | |
| dataset_id: hakari-bench/NanoCodeRAG | |
| task_name: NanoCodeRAGStackoverflowPosts | |
| split_name: NanoCodeRAGStackoverflowPosts | |
| language: en | |
| category: code | |
| document_path: docs/benchmark_tasks/NanoCodeRAG/NanoCodeRAGStackoverflowPosts.md | |
| source_research: | |
| primary_source_type: benchmark_paper | |
| paper_pdf_or_html_checked: true | |
| paper_url: https://arxiv.org/abs/2406.14497 | |
| additional_source_urls: | |
| - https://aclanthology.org/2025.findings-naacl.176/ | |
| - https://code-rag-bench.github.io/ | |
| - https://github.com/code-rag-bench/code-rag-bench | |
| - https://huggingface.co/datasets/code-rag-bench/stackoverflow-posts | |
| counts: | |
| queries: 200 | |
| documents: 10000 | |
| positive_qrels: 200 | |
| positives_per_query: | |
| average: 1.0 | |
| min: 1 | |
| median: 1.0 | |
| max: 1 | |
| multi_positive_queries: 0 | |
| multi_positive_query_percent: 0.0 | |
| text_stats_chars: | |
| query_mean: 209.835 | |
| document_mean: 4735.0462 | |
| bm25: | |
| ndcg_at_10: 0.6901992074 | |
| hit_at_10: 0.795 | |
| source: dataset_bm25_column | |
| learning: | |
| original_train_split: unknown | |
| evaluation_split_origin: CodeRAG-Bench Stack Overflow posts retrieval source sampled into NanoCodeRAG | |
| train_eval_overlap_audit: not_audited | |
| leakage_note: exclude NanoCodeRAG Stack Overflow queries, qrels, and positive posts | |
| useful_training_data: | |
| - non-overlapping Stack Overflow question-to-answer thread retrieval | |
| - duplicate-question and related-question retrieval pairs | |
| - issue-to-fix and API usage Q&A pairs | |
| - documentation-linked Q&A with tag-matched hard negatives | |
| synthetic_data: | |
| document_generation: realistic Stack Overflow-style threads with question, accepted answer, alternative answers, code snippets, caveats, and environment details | |
| question_generation: developer questions preserving language, framework, error message, and desired operation | |
| answerability: the selected post should contain a usable answer, workaround, warning, or API usage pattern | |
| multi_positive_training: single_positive_question_document_focus | |
| links: | |
| nano_dataset: https://huggingface.co/datasets/hakari-bench/NanoCodeRAG | |
| source_urls: | |
| - label: CodeRAG-Bench arXiv | |
| url: https://arxiv.org/abs/2406.14497 | |
| - label: CodeRAG-Bench project page | |
| url: https://code-rag-bench.github.io/ | |
| - label: CodeRAG-Bench GitHub | |
| url: https://github.com/code-rag-bench/code-rag-bench | |
| - label: code-rag-bench/stackoverflow-posts | |
| url: https://huggingface.co/datasets/code-rag-bench/stackoverflow-posts | |
| source_notes: [] | |
| references: | |
| - title: "CodeRAG-Bench: Can Retrieval Augment Code Generation?" | |
| url: https://arxiv.org/abs/2406.14497 | |
| year: 2025 | |
| doi: 10.18653/v1/2025.findings-naacl.176 | |
| is_paper: true | |
| source_confidence: definitive_paper_link | |
| ``` | |