# NanoDAPFAM / NanoDAPFAMInTitlAbsToFullText ## Overview `NanoDAPFAMInTitlAbsToFullText` retrieves full-text patent-family records from title-abstract queries. Positives are IN-domain DAPFAM citation links where the query and target share an IPC3 class. ## Details ### What the Original Data Measures [DAPFAM: A Domain-Aware Family-level Dataset to benchmark cross domain patent retrieval](https://arxiv.org/abs/2506.22141) reports that domain partitions are central to understanding patent retrieval difficulty. This split represents the same-domain setting with compact queries and full target documents. ### Observed Data Profile The Nano split has 200 queries, 10,000 documents, and 3,072 positive qrels. Queries average 771.33 characters, and full-text targets average 68,924.34 characters. Positives per query average 15.36. The query title and abstract usually identify the technical problem and proposed solution, while full target records contain detailed descriptions and claims from same-domain patent families. ### BM25 Difficulty Using the dataset-provided BM25 candidate column and best-ranked positive per query, BM25 reaches nDCG@10 = 0.6643 and hit@10 = 0.8500. It ranks 95 positives first and finds a positive in the top 100 for every query. The same-domain restriction and long targets make lexical matching effective, though the task still requires choosing among many related patents. ### Training Data That May Help Useful data includes same-domain title-abstract prior-art search against full patent records, family-level citation retrieval, and patent semantic search outside NanoDAPFAM evaluation families. ### Synthetic Data Guidance Generate compact source summaries and long same-domain target records. Positives should share a cited technical mechanism, while hard negatives should have similar IPC terms but no direct prior-art relation. ## Example Data | Query | Positive document | | --- | --- | | snow removal equipment with automatic walking function the invention relates to snow removal equipment with an automatic walking function. the snow removal equipment comprises a walking module, a working module and a control ... [truncated 225 chars](988 chars) | multifunctional device for clearing snow an apparatus and method for clearing an accumulation of matter from a surface that includes a blade configured to collect matter upon movement of the apparatus and means to shift the c ... [truncated 225 chars](59310 chars) | | waste disposal devices waste disposal device including a housing defining a waste compartment for receiving enclosed waste and arranged to removably receive a cartridge containing a length of flexible tubing which operatively ... [truncated 225 chars](891 chars) | cassette for dispensing pleated tubing a cassette for use in dispensing a pleated tubing. the cassette includes an annular body having a generally u shaped housing with an open central cylindrical core. the annular body inclu ... [truncated 225 chars](36269 chars) | | an article including identification for use in an electrically heated smoking system. there is provided an electrically heated smoking system (101) for receiving a smoking article (115) or cleaning article (205) configured fo ... [truncated 225 chars](1164 chars) | apparatus for generating aerosol from an aerosolisable medium, an article of aerosolisable medium and a method of determining a parameter of an article to provide an apparatus that heats an aerosolizable medium to volatilize ... [truncated 225 chars](52756 chars) | | low weight carpet and carpet tile and methods of manufacture low weight and non-square carpet tile suitable for use in mass transit vehicles, particularly passenger aircraft. the carpet tile preferably weighs less than about ... [truncated 225 chars](565 chars) | anti-static mats and carpets a novel carpet material or mat which is characterized by an extraordinary ability to quickly and comfortably discharge any build-up of a static electricity charge on a person who has built up such ... [truncated 225 chars](17195 chars) | | organosilicon precursors for interlayer dielectric films with low dielectric constants a method of forming a low dielectric constant interlayer dielectric film on a substrate by reacting, under chemical vapor deposition condi ... [truncated 225 chars](546 chars) | radiation shield a radiation shield and an assembly and a reactor including the radiation shield are disclosed. the radiation shield can be used to control heat flux from a susceptor heater assembly and thereby enable better ... [truncated 225 chars](48536 chars) | ## Dataset Information | Field | Value | | --- | --- | | Nano set | NanoDAPFAM | | Backing dataset | NanoDAPFAM | | Task / split | NanoDAPFAMInTitlAbsToFullText | | Hugging Face dataset | [hakari-bench/NanoDAPFAM](https://huggingface.co/datasets/hakari-bench/NanoDAPFAM) | | Language | en | | Category | natural_language | | Queries | 200 | | Documents | 10000 | | Positive qrels | 3072 | | Positives per query | avg 15.36, min 1, median 18.0, max 20 | | BM25 nDCG@10 | 0.6643 | | BM25 hit@10 | 0.8500 | | Query length avg chars | 771.33 | | Document length avg chars | 68924.34 | ### Public Sources - [DAPFAM: A Domain-Aware Family-level Dataset to benchmark cross domain patent retrieval](https://arxiv.org/abs/2506.22141); 2026; Iliass Ayaou, Denis Cavallucci, and Hicham Chibane; DOI: `10.1016/j.array.2026.100720`. - [DAPFAM DOI record](https://doi.org/10.1016/j.array.2026.100720). - [datalyes/DAPFAM_patent dataset card](https://huggingface.co/datasets/datalyes/DAPFAM_patent). ### Hugging Face Links - Nano dataset: [hakari-bench/NanoDAPFAM](https://huggingface.co/datasets/hakari-bench/NanoDAPFAM) - Source dataset: [datalyes/DAPFAM_patent](https://huggingface.co/datasets/datalyes/DAPFAM_patent) ### Source Reference Table | Title | Year | Type | URL | | --- | ---: | --- | --- | | DAPFAM: A Domain-Aware Family-level Dataset to benchmark cross domain patent retrieval | 2026 | arXiv paper | https://arxiv.org/abs/2506.22141 | | DAPFAM DOI record | 2026 | DOI | https://doi.org/10.1016/j.array.2026.100720 | | datalyes/DAPFAM_patent | 2025 | dataset card | https://huggingface.co/datasets/datalyes/DAPFAM_patent | ## Machine-Readable Metadata ```yaml benchmark_task_metadata: schema_version: 1 document_status: first_pass nano_set: NanoDAPFAM backing_dataset: NanoDAPFAM dataset_id: hakari-bench/NanoDAPFAM task_name: NanoDAPFAMInTitlAbsToFullText split_name: NanoDAPFAMInTitlAbsToFullText language: en category: natural_language document_path: docs/benchmark_tasks/NanoDAPFAM/NanoDAPFAMInTitlAbsToFullText.md source_research: primary_source_type: benchmark_paper paper_pdf_or_html_checked: true counts: queries: 200 documents: 10000 positive_qrels: 3072 positives_per_query: average: 15.36 min: 1 median: 18.0 max: 20 multi_positive_queries: 198 text_stats_chars: query_mean: 771.33 document_mean: 68924.3436 bm25: ndcg_at_10: 0.6642533082 hit_at_10: 0.85 source: dataset_bm25_column learning: original_train_split: not_confirmed evaluation_split_origin: DAPFAM IN-domain title-abstract to full-text patent-family retrieval train_eval_overlap_audit: not_audited leakage_note: exclude NanoDAPFAM evaluation family IDs and citation labels useful_training_data: - same-domain title-abstract patent retrieval - full-text patent prior-art search - family-level citation retrieval synthetic_data: document_generation: full-text same-domain patent family records question_generation: title and abstract source patent summaries answerability: positives should be cited same-domain families multi_positive_training: citation_family_multi_positive links: nano_dataset: https://huggingface.co/datasets/hakari-bench/NanoDAPFAM source_urls: - label: DAPFAM arXiv url: https://arxiv.org/abs/2506.22141 - label: DAPFAM DOI url: https://doi.org/10.1016/j.array.2026.100720 - label: datalyes/DAPFAM_patent url: https://huggingface.co/datasets/datalyes/DAPFAM_patent references: - title: "DAPFAM: A Domain-Aware Family-level Dataset to benchmark cross domain patent retrieval" url: https://arxiv.org/abs/2506.22141 year: 2026 doi: 10.1016/j.array.2026.100720 is_paper: true source_confidence: definitive_paper_link ```