Spaces:
Running
NanoCodeRAG / NanoCodeRAGStackoverflowPosts
Overview
CodeRAG-Bench treats Stack Overflow posts as developer knowledge that can augment code generation, using question-answer posts from the StackExchange portion of RedPajama-1T as retrievable documents. In this Nano split, a programming question, usually beginning with a title and short problem description, must retrieve a long post containing answers, code examples, caveats, and discussion. The observed topics include Photoshop automation, locked files in C#, concurrent database editing, MySQL triggers, and IIS bandwidth behavior, so relevance is a practical fix or design answer rather than a generic topic match.
Details
What the Original Data Measures
CodeRAG-Bench: Can Retrieval Augment Code Generation? collects Stack Overflow posts as one of its five developer retrieval sources, using the StackExchange split of RedPajama-1T. The paper treats each post as a retrievable document with a question, code responses, and textual explanations. Its open retrieval analysis reports that Stack Overflow posts can improve general programming generation, because retrieved posts may contain the same programming problem, code, and detailed explanations.
This Nano split focuses on retrieving those community Q&A documents directly. The source is less formal than documentation: posts include multiple answers, partial fixes, warnings, tool recommendations, and conversational text.
Observed Data Profile
The Nano split has 200 queries, 10,000 documents, and 200 positive qrel rows.
Every query has one positive. Queries average 209.84 characters and often begin
with Q: followed by a title and problem details. Documents average 4,735.05
characters and may contain several answers, code examples, caveats, and links.
The sampled queries include Mac font lookup from Photoshop automation, deleting a locked file in C#, concurrent database editing, MySQL trigger errors, and IIS bandwidth throttling. The positives are practical answer threads, not polished reference pages, so relevance may depend on matching the exact error condition or development environment.
BM25 Difficulty
Using the dataset-provided BM25 candidate column, BM25 reaches nDCG@10 = 0.6902 and hit@10 = 0.7950. It ranks 115 positives first and finds 159 positives in the top 10. Lexical matching is often strong because the query text is copied from the post's question and the answer thread repeats product names, languages, and error phrases.
The misses are usually near-neighbor Q&A failures. For TortoiseSVN branching,
BM25 retrieves another version-control discussion before the positive. For SQL
Server's equivalent of MySQL REPLACE INTO, it retrieves a database hosting
cost discussion. The retriever must distinguish the requested operation,
platform, and tool from other posts with similar technology words.
Training Data That May Help
Useful training data includes non-overlapping Stack Overflow question-to-answer thread retrieval, duplicate-question retrieval, issue-to-fix pairs, API usage Q&A, and documentation-linked Q&A. Training should exclude the NanoCodeRAG Stack Overflow evaluation queries, qrels, and positive posts.
Community Q&A data benefits from negatives that share tags or error messages but answer a different problem. Models should learn to use both the question title and body, because the title alone may be too broad.
Synthetic Data Guidance
For document-to-question generation, use non-evaluation Q&A posts and generate developer questions that preserve the language, framework, error, environment, and desired operation. The selected post should contain a usable answer, warning, or workaround.
For joint generation, create realistic Stack Overflow-style threads with a question, accepted answer, alternative answers, code snippets, and caveats. Hard negatives should share the same tags or tool names but solve a different failure mode. Do not use Nano evaluation queries or positive posts as seeds.
Example Data
| Query | Positive document |
|---|---|
| Q: How can I find the full path to a font from its display name on a Mac? I am using the Photoshop's javascript API to find the fonts in a given PSD. (149 chars) | Given a font name returned by the API, I want to find the actual physical font file that font name corresponds to on the disc. This is all happening in a python program running on OSX so I guess I'm looking for one of: * *Som ... [truncated 225 chars](5076 chars) |
| Q: How do I delete a file which is locked by another process in C#? I'm looking for a way to delete a file which is locked by another process using C#. I suspect the method must be able to find which process is locking the fi ... [truncated 225 chars](396 chars) | A: If you want to do it programmatically. I'm not sure... and I'd really recommend against it. If you're just troubleshooting stuff on your own machine, SysInternals Process Explorer can help you Run it, use the Find Handle c ... [truncated 225 chars](13199 chars) |
| Q: Editing database records by multiple users I have designed database tables (normalised, on an MS SQL server) and created a standalone windows front end for an application that will be used by a handful of users to add and ... [truncated 225 chars](334 chars) | I am concerned that if two users start editing the same record then the last to commit the update would be the 'winner' and important information may be lost. A number of solutions come to mind but I'm not sure if I am going ... [truncated 225 chars](4026 chars) |
| Q: Throw an error preventing a table update in a MySQL trigger If I have a trigger before the update on a table, how can I throw an error that prevents the update on that table? (177 chars) | A: CREATE TRIGGER sample_trigger_msg BEFORE INSERT FOR EACH ROW BEGIN IF(NEW.important_value) < (1*2) THEN DECLARE dummy INT; SELECT Enter your Message Here!!! INTO dummy FROM mytable WHERE mytable.id=new.id END IF; END; A: H ... [truncated 225 chars](5314 chars) |
| Q: Bandwith throttling in IIS 6 by IP Address I am writing an application that downloads large files in the background. All clients are logged in locally, or through a VPN. When they are logged in locally, I do not want to th ... [truncated 225 chars](391 chars) | Since this is an AIR Application, I figure I will throttle via server-side since I can do it from either the server itself (IIS 6) or the web service (asp.net / C#). Throttling through IIS 6 seems to work fine, but it seems l ... [truncated 225 chars](922 chars) |
Dataset Information
| Field | Value |
|---|---|
| Nano set | NanoCodeRAG |
| Backing dataset | NanoCodeRAG |
| Task / split | NanoCodeRAGStackoverflowPosts |
| Hugging Face dataset | hakari-bench/NanoCodeRAG |
| Language | en |
| Category | code |
| Queries | 200 |
| Documents | 10,000 |
| Positive qrels | 200 |
| BM25 nDCG@10 | 0.6902 |
| BM25 hit@10 | 0.7950 |
| Query length avg chars | 209.84 |
| Document length avg chars | 4,735.05 |
Public Sources
- CodeRAG-Bench: Can Retrieval Augment Code Generation?; 2025; Zora Zhiruo Wang, Akari Asai, Xinyan Velocity Yu, Frank F. Xu, Yiqing Xie, Graham Neubig, and Daniel Fried; DOI:
10.18653/v1/2025.findings-naacl.176. - CodeRAG-Bench project page.
- CodeRAG-Bench GitHub repository.
- code-rag-bench/stackoverflow-posts dataset card.
Hugging Face Links
- Nano dataset: hakari-bench/NanoCodeRAG
- Source dataset: code-rag-bench/stackoverflow-posts
Source Reference Table
| Title | Year | Type | URL |
|---|---|---|---|
| CodeRAG-Bench: Can Retrieval Augment Code Generation? | 2025 | arXiv paper | https://arxiv.org/abs/2406.14497 |
| CodeRAG-Bench project page | 2025 | project page | https://code-rag-bench.github.io/ |
| code-rag-bench/stackoverflow-posts | 2024 | dataset card | https://huggingface.co/datasets/code-rag-bench/stackoverflow-posts |
Machine-Readable Metadata
benchmark_task_metadata:
schema_version: 1
document_status: first_pass
nano_set: NanoCodeRAG
backing_dataset: NanoCodeRAG
dataset_id: hakari-bench/NanoCodeRAG
task_name: NanoCodeRAGStackoverflowPosts
split_name: NanoCodeRAGStackoverflowPosts
language: en
category: code
document_path: docs/benchmark_tasks/NanoCodeRAG/NanoCodeRAGStackoverflowPosts.md
source_research:
primary_source_type: benchmark_paper
paper_pdf_or_html_checked: true
paper_url: https://arxiv.org/abs/2406.14497
additional_source_urls:
- https://aclanthology.org/2025.findings-naacl.176/
- https://code-rag-bench.github.io/
- https://github.com/code-rag-bench/code-rag-bench
- https://huggingface.co/datasets/code-rag-bench/stackoverflow-posts
counts:
queries: 200
documents: 10000
positive_qrels: 200
positives_per_query:
average: 1.0
min: 1
median: 1.0
max: 1
multi_positive_queries: 0
multi_positive_query_percent: 0.0
text_stats_chars:
query_mean: 209.835
document_mean: 4735.0462
bm25:
ndcg_at_10: 0.6901992074
hit_at_10: 0.795
source: dataset_bm25_column
learning:
original_train_split: unknown
evaluation_split_origin: CodeRAG-Bench Stack Overflow posts retrieval source sampled into NanoCodeRAG
train_eval_overlap_audit: not_audited
leakage_note: exclude NanoCodeRAG Stack Overflow queries, qrels, and positive posts
useful_training_data:
- non-overlapping Stack Overflow question-to-answer thread retrieval
- duplicate-question and related-question retrieval pairs
- issue-to-fix and API usage Q&A pairs
- documentation-linked Q&A with tag-matched hard negatives
synthetic_data:
document_generation: realistic Stack Overflow-style threads with question, accepted answer, alternative answers, code snippets, caveats, and environment details
question_generation: developer questions preserving language, framework, error message, and desired operation
answerability: the selected post should contain a usable answer, workaround, warning, or API usage pattern
multi_positive_training: single_positive_question_document_focus
links:
nano_dataset: https://huggingface.co/datasets/hakari-bench/NanoCodeRAG
source_urls:
- label: CodeRAG-Bench arXiv
url: https://arxiv.org/abs/2406.14497
- label: CodeRAG-Bench project page
url: https://code-rag-bench.github.io/
- label: CodeRAG-Bench GitHub
url: https://github.com/code-rag-bench/code-rag-bench
- label: code-rag-bench/stackoverflow-posts
url: https://huggingface.co/datasets/code-rag-bench/stackoverflow-posts
source_notes: []
references:
- title: "CodeRAG-Bench: Can Retrieval Augment Code Generation?"
url: https://arxiv.org/abs/2406.14497
year: 2025
doi: 10.18653/v1/2025.findings-naacl.176
is_paper: true
source_confidence: definitive_paper_link