You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Self-RAG 模型仓库

本目录存放 Self-RAG（Self-Reflective Retrieval-Augmented Generation）相关的预训练模型权重。

注意：这些模型权重均为直接从 HuggingFace Hub 下载的官方预训练模型，并非本地训练得到。

目录结构

models/
├── README.md                  ← 本文件
├── selfrag_llama2_7b/         ← Self-RAG 7B 生成器模型（基于 Llama-2-7B 微调）
│   ├── pytorch_model-*.bin    ← 模型权重（~13.5GB）
│   ├── config.json            ← 模型配置（vocab_size=32016，含反思特殊 token）
│   ├── tokenizer.model        ← SentencePiece tokenizer
│   ├── added_tokens.json      ← 新增的反思 token（[Retrieval], [Relevant], [Fully supported] 等）
│   └── ...
└── contriever-msmarco/        ← Contriever-MSMARCO 检索器模型（Self-RAG 原始默认检索器）
    ├── pytorch_model.bin
    ├── config.json
    └── ...

与 GitHub 仓库的关系

上游官方仓库

仓库地址：AkariAsai/self-rag
论文：Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection（ICLR 2024, Oral top 1%）
作者：Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh Hajishirzi

我们的 Fork 仓库

仓库地址：1571859588/self-rag
与上游的关系：Fork 自 AkariAsai/self-rag，在 MAEDA 分支上扩展了对 MAEDA Benchmark（OpenROAD EDA 领域 QA）的集成支持
主要扩展内容：
- 增加了 BGE 嵌入模型检索（替代原始 Contriever），用于 OpenROAD 知识库的公平评测
- 增加了 MAEDA 数据格式转换、后处理、评估流水线
- 修复了 vLLM 0.8+ 兼容性问题

本地代码位置

Self-RAG 在 MAEDA 项目中作为 baseline 使用，代码位于：

/mnt/public/sichuan_a/nyt/MAEDA/huada-docqa-demo/huada-docqa-demo/
    experiments/regression_test/MAEDA-DATE26/baselines/self-rag/

该目录是 1571859588/self-rag 仓库的克隆（工作在 MAEDA 分支上），git remote 配置：

origin → git@github.com:AkariAsai/self-rag.git（上游官方仓库）
upstream → git@github.com:1571859588/self-rag.git（我们的 Fork）

模型下载方式

selfrag_llama2_7b（生成器）

通过 git clone 从 HuggingFace Hub 下载（需要安装 git-lfs）：

# 安装 git-lfs（如果未安装）
apt install git-lfs  # 或 conda install git-lfs
git lfs install

# 克隆模型
git clone https://huggingface.co/selfrag/selfrag_llama2_7b \
    /mnt/public/sichuan_a/nyt/models/Self-RAG/models/selfrag_llama2_7b

HuggingFace 页面：selfrag/selfrag_llama2_7b
基座模型：meta-llama/Llama-2-7b-hf
模型架构：LlamaForCausalLM，vocab_size=32016（原始 32000 + 16 个反思特殊 token）
精度：bfloat16
大小：~13.5GB

contriever-msmarco（检索器）

git clone https://huggingface.co/facebook/contriever-msmarco \
    /mnt/public/sichuan_a/nyt/models/Self-RAG/models/contriever-msmarco

HuggingFace 页面：facebook/contriever-msmarco
用途：Self-RAG 原始论文使用的检索器（在 MAEDA 评测中我们使用了 BGE 作为替代）

注意：13B 模型也可从 HuggingFace 下载：selfrag/selfrag_llama2_13b

环境配置

从零创建环境

方式一：使用 environment.yml（原始 Self-RAG 环境）

cd /mnt/public/sichuan_a/nyt/MAEDA/huada-docqa-demo/huada-docqa-demo/experiments/regression_test/MAEDA-DATE26/baselines/self-rag

# 创建 conda 环境
conda env create -f environment.yml

# 激活环境
conda activate selfrag

# 安装 flash-attn（需要 CUDA 环境）
pip3 install flash-attn==2.3.6

# 安装 faiss-gpu
conda install -c conda-forge faiss-gpu

Python 版本：3.8
PyTorch：2.1.2
vLLM：0.2.6（原始版本，MAEDA 流水线中需要 0.8+ 并配合兼容补丁）
CUDA：12.1

方式二：使用 requirements.txt

pip install -r requirements.txt

关键依赖版本

包	版本	说明
`vllm`	0.2.6（原始） / 0.8+（MAEDA 流水线）	推理加速引擎
`torch`	2.1.2
`transformers`	4.36.2
`deepspeed`	0.12.6	训练时使用
`flash-attn`	2.3.6	注意力优化
`faiss-gpu`	-	向量检索（MAEDA 流水线需要）
`sentence_transformers`	-	BGE 嵌入模型（MAEDA 流水线需要）

训练代码与数据

注意：本目录中的模型是直接从 HuggingFace 下载的官方预训练权重，我们没有在本地进行训练。以下是官方训练流程的说明，供参考。

官方训练数据

训练数据集：selfrag/selfrag_train_data（150K 条训练实例）
Google Drive 备份：下载链接

官方训练流程

Self-RAG 的训练分为 4 步：

Critic 数据生成：使用 GPT-4 生成反思 token 的训练数据
- 代码：data_creation/critic/
- 预生成数据：下载

Critic 模型训练：微调 Llama-2-7B 使其能预测反思 token

cd data_creation
torchrun --nproc_per_node=2 --master_port=2568 train_special_tokens.py \
    --model_name_or_path meta-llama/Llama-2-7b-hf \
    --data_path PATH_TO_TRAIN_DATA_FILE \
    --bf16 True --output_dir PATH_TO_CRITIC_MODEL \
    --num_train_epochs 3 --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 8 --learning_rate 2e-5 \
    --lr_scheduler_type cosine --fsdp "full_shard auto_wrap"

Generator 数据生成：使用 Critic + Retriever 生成生成器的训练数据
- 代码：data_creation/generator/
Generator 模型训练：使用 DeepSpeed 训练生成器
```
cd retrieval_lm
bash script_finetune_7b.sh
```
- 硬件要求：8× A100 40GB（7B 模型）/ 4× A100 80GB（13B 模型）

运行方式

快速推理测试

from vllm import LLM, SamplingParams

# 加载模型
model = LLM(
    "/mnt/public/sichuan_a/nyt/models/Self-RAG/models/selfrag_llama2_7b",
    dtype="half"
)
sampling_params = SamplingParams(
    temperature=0.0, top_p=1.0,
    max_tokens=100, skip_special_tokens=False
)

def format_prompt(input, paragraph=None):
    prompt = "### Instruction:\n{0}\n\n### Response:\n".format(input)
    if paragraph is not None:
        prompt += "[Retrieval]<paragraph>{0}</paragraph>".format(paragraph)
    return prompt

# 不需要检索的问题
query = "Leave odd one out: twitter, instagram, whatsapp."
preds = model.generate([format_prompt(query)], sampling_params)
print(preds[0].outputs[0].text)

# 需要检索的问题（插入检索到的段落）
query = "Can you tell me the difference between llamas and alpacas?"
paragraph = "The alpaca (Lama pacos) is a species of South American camelid mammal..."
preds = model.generate([format_prompt(query, paragraph)], sampling_params)
print(preds[0].outputs[0].text)

MAEDA Benchmark 评测（完整流水线）

在 MAEDA 项目中，Self-RAG 作为 baseline 进行评测，使用 BGE 检索器替代原始 Contriever：

# 进入工作目录
cd /mnt/public/sichuan_a/nyt/MAEDA/huada-docqa-demo/huada-docqa-demo/experiments/regression_test/MAEDA-DATE26/baselines/self-rag

# 激活环境
conda activate huada_docqa_demo_release_v1

# 运行完整流水线（5 步：检索 → 转换 → 推理 → 后处理 → 评估）
CUDA_VISIBLE_DEVICES=4 bash run_selfrag_maeda.sh

# 或单独运行某一步
bash run_selfrag_maeda.sh retrieve     # Step 1: BGE 检索
bash run_selfrag_maeda.sh convert      # Step 2: 数据格式转换
bash run_selfrag_maeda.sh inference    # Step 3: Self-RAG 推理
bash run_selfrag_maeda.sh postprocess  # Step 4: 后处理
bash run_selfrag_maeda.sh eval         # Step 5: MAEDA 评估

流水线步骤说明

步骤	脚本	说明
1. 检索	`retrieve_with_bge.py`	使用 BGE-large-en-v1.5 对 OpenROAD 知识库建立 FAISS 索引并检索
2. 转换	`convert_maeda_to_selfrag.py`	将 MAEDA benchmark JSON 转为 Self-RAG 输入格式
3. 推理	`retrieval_lm/run_short_form.py`	使用 Self-RAG 7B 模型进行自适应检索推理
4. 后处理	`postprocess_selfrag_output.py`	清理反思 token，格式化为 MAEDA 评估器输入
5. 评估	`run_eval.py`	使用外部 LLM API 作为 judge 评估答案质量

关键配置

生成器模型路径：/mnt/public/sichuan_a/nyt/models/Self-RAG/models/selfrag_llama2_7b
BGE 嵌入模型路径：/mnt/public/sichuan_a/nyt/models/RAG-EDA/models/finetuned-models/embedding/bge-large-en-v1.5/output_flagembedding
推理参数：mode=adaptive_retrieval, ndocs=5, threshold=0.2, max_new_tokens=300
GPU 要求：至少 15GB 显存（推理）

恢复中断的推理

如果推理过程中断（OOM、vllm 崩溃等），进度会保存到临时文件，可使用 --resume_file 恢复：

CUDA_VISIBLE_DEVICES=4 VLLM_WORKER_MULTIPROC_METHOD=spawn VLLM_USE_V1=0 \
  python3 retrieval_lm/run_short_form.py \
    --model_name /mnt/public/sichuan_a/nyt/models/Self-RAG/models/selfrag_llama2_7b \
    --input_file maeda_selfrag_data/selfrag_input.json \
    --mode adaptive_retrieval --max_new_tokens 300 --threshold 0.2 \
    --output_file results/selfrag_raw_results.json \
    --metric match --ndocs 5 --use_groundness --use_utility --use_seqscore \
    --dtype half --world_size 1 \
    --resume_file results/selfrag_raw_results.json_tmp

MAEDA 评测结果（BGE 检索）

指标	Agent	数值
Accuracy (judge=True)	—	3.33% (10/300)
检索错误	Ret-Agent	30.3% (91/300)
缺失关键点	SC-Agent	85.3% (256/300)
矛盾	SC-Agent	13.0% (39/300)
错误拒答	RF-Agent	0.0% (0/300)
错误未拒答	RF-Agent	0.0% (0/300)
指令幻觉	Halluc-Agent	23.0% (69/300)
示例幻觉	Halluc-Agent	11.0% (33/300)

结果目录：experiments/regression_test/MAEDA-DATE26/baselines/self-rag/results/

vLLM 兼容性修改

为适配 vLLM 0.8+，对 retrieval_lm/run_short_form.py 做了以下修改：

添加 _get_logprob() 辅助函数（vLLM ≥ 0.8 返回 Logprob 对象而非 float）
添加 max_logprobs=32016 参数（vLLM ≥ 0.8 默认限制 logprobs 为 20）
添加 enforce_eager=True（避免 torch.compile 相关 msgspec 序列化错误）
添加 --start_from 和 --resume_file 参数（支持中断恢复）
设置 VLLM_USE_V1=0（禁用 v1 引擎，避免 large logprobs 序列化 bug）

引用

@inproceedings{
  asai2024selfrag,
  author={Asai, Akari and Wu, Zeqiu and Wang, Yizhong and Sil, Avirup and Hajishirzi, Hannaneh},
  title={Self-{RAG}: Learning to Retrieve, Generate, and Critique through Self-Reflection},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2024},
  url={https://openreview.net/forum?id=hSyW5go0v8}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for LenckCuak/Self-RAG

Base model

facebook/contriever-msmarco

Finetuned

(2)

this model

Dataset used to train LenckCuak/Self-RAG

Collection including LenckCuak/Self-RAG

RAG

Collection

Adaptive-RAG, Self-RAG, C-RAG and so on. • 2 items • Updated 3 days ago

Paper for LenckCuak/Self-RAG

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Paper • 2310.11511 • Published Oct 17, 2023 • 79

LenckCuak
/

Self-RAG