--- license: other language: - en tags: - temporal-reasoning - knowledge-graph - graphrag - retrieval-augmented-generation - lora - peft pretty_name: Temporal-Aware GraphRAG artifacts (anonymous) --- # Temporal-Aware GraphRAG — anonymous data archive Anonymous double-blind artifact accompanying the EMNLP submission *"The Lever Is the Prompt: Retrieval-Conditioned Prompting in Temporal-Aware GraphRAG."* This is the **data + trained-adapter** half; the **code** is in the companion anonymous repository linked from the paper. See `DATASHEET.md` for provenance. > No author, affiliation, or identifying information is included. Please do not > attempt to de-anonymize. ## Contents ``` adapters/ LoRA adapters (inference-ready: adapter_model.safetensors + adapter_config.json + tokenizer + chat_template). 14 policies: sft-v3{,-seed1337,-seed7} Qwen3-8B headline (3 seeds) sft-llama31{,-seed1337,-seed7} Llama-3.1-8B cross-arch (3 seeds) sft-mistral{,-seed1337,-seed7} Mistral-7B cross-arch (3 seeds) sft-multitq{,-seed1337,-seed7} MultiTQ cross-benchmark (3 seeds) sft-multitq-{llama,mistral} MultiTQ cross-arch eval/ Per-question predictions + gold for every reported run (74 JSONs): TempBench 3-seed, 3-hop ablation, empty/shuffled-evidence, data-scale (1k/2k), full-test LLM-judge runs, MultiTQ, baselines. benchmark/ benchmark_labelled.jsonl + labels.tsv (the benchmark of [anon-bench], provided as a read-only input; see PROVENANCE.txt). sft_data/ Retrieval-conditioned SFT corpora (CoT, terse, MultiTQ). logs/ Training / evaluation logs (negative-result evidence). ``` ## Mapping to the code repository Place these so the code repo's scripts find them: ``` adapters// -> checkpoints//final/ eval/*.json -> outputs/eval/ benchmark/* -> outputs/benchmark/ sft_data/*.jsonl -> outputs/ ``` ## Notes - Adapters are **inference-only** (training state / optimizer checkpoints removed). - Three contaminated judge runs (errored API calls) are **excluded**; all reported judge-EM numbers come from the clean full-test runs included here. ## License Mixed; the `license: other` tag reflects this: - **Data** (eval outputs, benchmark, SFT corpora): derived from Wikidata (CC0) via TGB 2.0 — permissive. - **LoRA adapters**: derivative works of their base models and governed by those licenses — Qwen3-8B (Apache-2.0), Llama-3.1-8B-Instruct (Llama 3.1 Community License), Mistral-7B-Instruct-v0.3 (Apache-2.0). Use of the Llama-based adapters is subject to the Llama 3.1 Community License.