Instructions to use LenckCuak/Self-RAG with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LenckCuak/Self-RAG with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LenckCuak/Self-RAG")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("LenckCuak/Self-RAG", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use LenckCuak/Self-RAG with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LenckCuak/Self-RAG" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LenckCuak/Self-RAG", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/LenckCuak/Self-RAG
- SGLang
How to use LenckCuak/Self-RAG with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LenckCuak/Self-RAG" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LenckCuak/Self-RAG", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LenckCuak/Self-RAG" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LenckCuak/Self-RAG", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use LenckCuak/Self-RAG with Docker Model Runner:
docker model run hf.co/LenckCuak/Self-RAG
Self-RAG 模型仓库
本目录存放 Self-RAG(Self-Reflective Retrieval-Augmented Generation)相关的预训练模型权重。
注意:这些模型权重均为直接从 HuggingFace Hub 下载的官方预训练模型,并非本地训练得到。
目录结构
models/
├── README.md ← 本文件
├── selfrag_llama2_7b/ ← Self-RAG 7B 生成器模型(基于 Llama-2-7B 微调)
│ ├── pytorch_model-*.bin ← 模型权重(~13.5GB)
│ ├── config.json ← 模型配置(vocab_size=32016,含反思特殊 token)
│ ├── tokenizer.model ← SentencePiece tokenizer
│ ├── added_tokens.json ← 新增的反思 token([Retrieval], [Relevant], [Fully supported] 等)
│ └── ...
└── contriever-msmarco/ ← Contriever-MSMARCO 检索器模型(Self-RAG 原始默认检索器)
├── pytorch_model.bin
├── config.json
└── ...
与 GitHub 仓库的关系
上游官方仓库
- 仓库地址:AkariAsai/self-rag
- 论文:Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection(ICLR 2024, Oral top 1%)
- 作者:Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh Hajishirzi
我们的 Fork 仓库
- 仓库地址:1571859588/self-rag
- 与上游的关系:Fork 自
AkariAsai/self-rag,在MAEDA分支上扩展了对 MAEDA Benchmark(OpenROAD EDA 领域 QA)的集成支持 - 主要扩展内容:
- 增加了 BGE 嵌入模型检索(替代原始 Contriever),用于 OpenROAD 知识库的公平评测
- 增加了 MAEDA 数据格式转换、后处理、评估流水线
- 修复了 vLLM 0.8+ 兼容性问题
本地代码位置
Self-RAG 在 MAEDA 项目中作为 baseline 使用,代码位于:
/mnt/public/sichuan_a/nyt/MAEDA/huada-docqa-demo/huada-docqa-demo/
experiments/regression_test/MAEDA-DATE26/baselines/self-rag/
该目录是 1571859588/self-rag 仓库的克隆(工作在 MAEDA 分支上),git remote 配置:
origin→git@github.com:AkariAsai/self-rag.git(上游官方仓库)upstream→git@github.com:1571859588/self-rag.git(我们的 Fork)
模型下载方式
selfrag_llama2_7b(生成器)
通过 git clone 从 HuggingFace Hub 下载(需要安装 git-lfs):
# 安装 git-lfs(如果未安装)
apt install git-lfs # 或 conda install git-lfs
git lfs install
# 克隆模型
git clone https://huggingface.co/selfrag/selfrag_llama2_7b \
/mnt/public/sichuan_a/nyt/models/Self-RAG/models/selfrag_llama2_7b
- HuggingFace 页面:selfrag/selfrag_llama2_7b
- 基座模型:
meta-llama/Llama-2-7b-hf - 模型架构:
LlamaForCausalLM,vocab_size=32016(原始 32000 + 16 个反思特殊 token) - 精度:
bfloat16 - 大小:~13.5GB
contriever-msmarco(检索器)
git clone https://huggingface.co/facebook/contriever-msmarco \
/mnt/public/sichuan_a/nyt/models/Self-RAG/models/contriever-msmarco
- HuggingFace 页面:facebook/contriever-msmarco
- 用途:Self-RAG 原始论文使用的检索器(在 MAEDA 评测中我们使用了 BGE 作为替代)
注意:13B 模型也可从 HuggingFace 下载:selfrag/selfrag_llama2_13b
环境配置
推荐环境(已配置好的 Conda 环境)
项目中已配置好的 conda 环境可直接使用:
conda activate huada_docqa_demo_release_v1
该环境包含 vllm、torch、transformers、spacy、sentence_transformers、faiss 等所有必要依赖。
从零创建环境
方式一:使用 environment.yml(原始 Self-RAG 环境)
cd /mnt/public/sichuan_a/nyt/MAEDA/huada-docqa-demo/huada-docqa-demo/experiments/regression_test/MAEDA-DATE26/baselines/self-rag
# 创建 conda 环境
conda env create -f environment.yml
# 激活环境
conda activate selfrag
# 安装 flash-attn(需要 CUDA 环境)
pip3 install flash-attn==2.3.6
# 安装 faiss-gpu
conda install -c conda-forge faiss-gpu
- Python 版本:3.8
- PyTorch:2.1.2
- vLLM:0.2.6(原始版本,MAEDA 流水线中需要 0.8+ 并配合兼容补丁)
- CUDA:12.1
方式二:使用 requirements.txt
pip install -r requirements.txt
关键依赖版本
| 包 | 版本 | 说明 |
|---|---|---|
vllm |
0.2.6(原始) / 0.8+(MAEDA 流水线) | 推理加速引擎 |
torch |
2.1.2 | |
transformers |
4.36.2 | |
deepspeed |
0.12.6 | 训练时使用 |
flash-attn |
2.3.6 | 注意力优化 |
faiss-gpu |
- | 向量检索(MAEDA 流水线需要) |
sentence_transformers |
- | BGE 嵌入模型(MAEDA 流水线需要) |
训练代码与数据
注意:本目录中的模型是直接从 HuggingFace 下载的官方预训练权重,我们没有在本地进行训练。以下是官方训练流程的说明,供参考。
官方训练数据
- 训练数据集:selfrag/selfrag_train_data(150K 条训练实例)
- Google Drive 备份:下载链接
官方训练流程
Self-RAG 的训练分为 4 步:
Critic 数据生成:使用 GPT-4 生成反思 token 的训练数据
- 代码:
data_creation/critic/ - 预生成数据:下载
- 代码:
Critic 模型训练:微调 Llama-2-7B 使其能预测反思 token
cd data_creation torchrun --nproc_per_node=2 --master_port=2568 train_special_tokens.py \ --model_name_or_path meta-llama/Llama-2-7b-hf \ --data_path PATH_TO_TRAIN_DATA_FILE \ --bf16 True --output_dir PATH_TO_CRITIC_MODEL \ --num_train_epochs 3 --per_device_train_batch_size 1 \ --gradient_accumulation_steps 8 --learning_rate 2e-5 \ --lr_scheduler_type cosine --fsdp "full_shard auto_wrap"Generator 数据生成:使用 Critic + Retriever 生成生成器的训练数据
- 代码:
data_creation/generator/
- 代码:
Generator 模型训练:使用 DeepSpeed 训练生成器
cd retrieval_lm bash script_finetune_7b.sh- 硬件要求:8× A100 40GB(7B 模型)/ 4× A100 80GB(13B 模型)
运行方式
快速推理测试
from vllm import LLM, SamplingParams
# 加载模型
model = LLM(
"/mnt/public/sichuan_a/nyt/models/Self-RAG/models/selfrag_llama2_7b",
dtype="half"
)
sampling_params = SamplingParams(
temperature=0.0, top_p=1.0,
max_tokens=100, skip_special_tokens=False
)
def format_prompt(input, paragraph=None):
prompt = "### Instruction:\n{0}\n\n### Response:\n".format(input)
if paragraph is not None:
prompt += "[Retrieval]<paragraph>{0}</paragraph>".format(paragraph)
return prompt
# 不需要检索的问题
query = "Leave odd one out: twitter, instagram, whatsapp."
preds = model.generate([format_prompt(query)], sampling_params)
print(preds[0].outputs[0].text)
# 需要检索的问题(插入检索到的段落)
query = "Can you tell me the difference between llamas and alpacas?"
paragraph = "The alpaca (Lama pacos) is a species of South American camelid mammal..."
preds = model.generate([format_prompt(query, paragraph)], sampling_params)
print(preds[0].outputs[0].text)
MAEDA Benchmark 评测(完整流水线)
在 MAEDA 项目中,Self-RAG 作为 baseline 进行评测,使用 BGE 检索器替代原始 Contriever:
# 进入工作目录
cd /mnt/public/sichuan_a/nyt/MAEDA/huada-docqa-demo/huada-docqa-demo/experiments/regression_test/MAEDA-DATE26/baselines/self-rag
# 激活环境
conda activate huada_docqa_demo_release_v1
# 运行完整流水线(5 步:检索 → 转换 → 推理 → 后处理 → 评估)
CUDA_VISIBLE_DEVICES=4 bash run_selfrag_maeda.sh
# 或单独运行某一步
bash run_selfrag_maeda.sh retrieve # Step 1: BGE 检索
bash run_selfrag_maeda.sh convert # Step 2: 数据格式转换
bash run_selfrag_maeda.sh inference # Step 3: Self-RAG 推理
bash run_selfrag_maeda.sh postprocess # Step 4: 后处理
bash run_selfrag_maeda.sh eval # Step 5: MAEDA 评估
流水线步骤说明
| 步骤 | 脚本 | 说明 |
|---|---|---|
| 1. 检索 | retrieve_with_bge.py |
使用 BGE-large-en-v1.5 对 OpenROAD 知识库建立 FAISS 索引并检索 |
| 2. 转换 | convert_maeda_to_selfrag.py |
将 MAEDA benchmark JSON 转为 Self-RAG 输入格式 |
| 3. 推理 | retrieval_lm/run_short_form.py |
使用 Self-RAG 7B 模型进行自适应检索推理 |
| 4. 后处理 | postprocess_selfrag_output.py |
清理反思 token,格式化为 MAEDA 评估器输入 |
| 5. 评估 | run_eval.py |
使用外部 LLM API 作为 judge 评估答案质量 |
关键配置
- 生成器模型路径:
/mnt/public/sichuan_a/nyt/models/Self-RAG/models/selfrag_llama2_7b - BGE 嵌入模型路径:
/mnt/public/sichuan_a/nyt/models/RAG-EDA/models/finetuned-models/embedding/bge-large-en-v1.5/output_flagembedding - 推理参数:
mode=adaptive_retrieval,ndocs=5,threshold=0.2,max_new_tokens=300 - GPU 要求:至少 15GB 显存(推理)
恢复中断的推理
如果推理过程中断(OOM、vllm 崩溃等),进度会保存到临时文件,可使用 --resume_file 恢复:
CUDA_VISIBLE_DEVICES=4 VLLM_WORKER_MULTIPROC_METHOD=spawn VLLM_USE_V1=0 \
python3 retrieval_lm/run_short_form.py \
--model_name /mnt/public/sichuan_a/nyt/models/Self-RAG/models/selfrag_llama2_7b \
--input_file maeda_selfrag_data/selfrag_input.json \
--mode adaptive_retrieval --max_new_tokens 300 --threshold 0.2 \
--output_file results/selfrag_raw_results.json \
--metric match --ndocs 5 --use_groundness --use_utility --use_seqscore \
--dtype half --world_size 1 \
--resume_file results/selfrag_raw_results.json_tmp
MAEDA 评测结果(BGE 检索)
| 指标 | Agent | 数值 |
|---|---|---|
| Accuracy (judge=True) | — | 3.33% (10/300) |
| 检索错误 | Ret-Agent | 30.3% (91/300) |
| 缺失关键点 | SC-Agent | 85.3% (256/300) |
| 矛盾 | SC-Agent | 13.0% (39/300) |
| 错误拒答 | RF-Agent | 0.0% (0/300) |
| 错误未拒答 | RF-Agent | 0.0% (0/300) |
| 指令幻觉 | Halluc-Agent | 23.0% (69/300) |
| 示例幻觉 | Halluc-Agent | 11.0% (33/300) |
结果目录:
experiments/regression_test/MAEDA-DATE26/baselines/self-rag/results/
vLLM 兼容性修改
为适配 vLLM 0.8+,对 retrieval_lm/run_short_form.py 做了以下修改:
- 添加
_get_logprob()辅助函数(vLLM ≥ 0.8 返回Logprob对象而非float) - 添加
max_logprobs=32016参数(vLLM ≥ 0.8 默认限制 logprobs 为 20) - 添加
enforce_eager=True(避免torch.compile相关 msgspec 序列化错误) - 添加
--start_from和--resume_file参数(支持中断恢复) - 设置
VLLM_USE_V1=0(禁用 v1 引擎,避免 large logprobs 序列化 bug)
引用
@inproceedings{
asai2024selfrag,
author={Asai, Akari and Wu, Zeqiu and Wang, Yizhong and Sil, Avirup and Hajishirzi, Hannaneh},
title={Self-{RAG}: Learning to Retrieve, Generate, and Critique through Self-Reflection},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=hSyW5go0v8}
}
Model tree for LenckCuak/Self-RAG
Base model
facebook/contriever-msmarco