ningzhuo
/

SongPanda

Model card Files Files and versions

ningzhuo commited on Dec 5, 2025

Commit

aff47e0

·

verified ·

1 Parent(s): e7861aa

Update README.md

Files changed (1) hide show

README.md +55 -5

README.md CHANGED Viewed

@@ -1,5 +1,55 @@
----
-license: apache-2.0
-tags:
-- llama-factory
----

+---
+license: apache-2.0
+tags:
+- llama-factory
+datasets:
+- ningzhuo/SongPanda-Bench
+metrics:
+- accuracy
+- bleu
+base_model:
+- Qwen/Qwen2.5-VL-7B-Instruct
+---
+SongPanda（论文投稿“数字人文”期刊中）
+**模型概述**
+SongPanda 是针对古籍数字化场景优化的视觉语言模型，基于 Qwen2.5-VL-7B 通过 LoRA 微调构建，专注于复杂版式古籍的结构化信息提取，解决传统 OCR 难以区分正文、夹注、版心等字段的痛点。
+**核心功能**
+智能字段区分：自动识别并排除古籍版心无关信息
+夹注精准标注：以标签区分双行小字夹注与正文大字
+复杂版面适配：支持宋至清代及域外刻本等多类型古籍图像
+**性能亮点**
+📊 SOTA 表现：在 SongPanda-Bench 测试集上综合准确度达 0.80，超越 Gemini-2.5-pro 等模型
+💰 低成本优势：单页推理成本仅 0.003 元（3090 服务器），为闭源模型的 1/50
+⚡ 高效推理：平均 8 秒 / 页，支持批量处理古籍图像
+🛡️ 强鲁棒性：适配含噪音、摩尔纹等受损古籍图像
+**快速使用**
+推理示例
+from transformers import AutoProcessor, AutoModelForVision2Seq
+from PIL import Image
+# 加载模型
+model = AutoModelForVision2Seq.from_pretrained("ningzhuo/SongPanda")
+processor = AutoProcessor.from_pretrained("ningzhuo/SongPanda")
+# 处理古籍图像
+image = Image.open("ancient_book_page.jpg").convert("RGB")
+inputs = processor(images=image, text="请提取正文并标注夹注", return_tensors="pt")
+# 生成结果
+outputs = model.generate(**inputs, max_new_tokens=1024)
+print(processor.decode(outputs[0], skip_special_tokens=True))
+**配套数据集**
+SongPanda-Bench：356 张测试图像，源自 105 本宋元明清及域外刻本，含专业标注
+训练数据：2 万余张古籍图像
+**作者团队**
+郑陈锐 ¹，段伟 ²，范怿泽 ¹
+¹ 中山大学中文系 ² 上海师范大学人文学院
+**说明**
+本模型相关的训练细节、技术原理及完整实验结果详见投稿中论文，敬请期待。