Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,124 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
frameworks:
|
| 3 |
+
- Pytorch
|
| 4 |
+
tasks:
|
| 5 |
+
- universal-information-extraction
|
| 6 |
+
base_model:
|
| 7 |
+
- Qwen/Qwen3-0.6B
|
| 8 |
+
base_model_relation: finetune
|
| 9 |
+
license: apache-2.0
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# SmartResume - 智能简历解析系统
|
| 13 |
+
|
| 14 |
+
<div align="center">
|
| 15 |
+
<img src="assets/logo.png" alt="SmartResume Logo" width="80%" >
|
| 16 |
+
</div>
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
<p align="center">
|
| 20 |
+
🤗 <a href="https://www.modelscope.cn/models/Alibaba-EI/SmartResume">Model</a>   |   🤖 <a href="https://modelscope.cn/studios/Alibaba-EI/SmartResumeDemo/summary">Demo</a>   |   📑 <a href="https://arxiv.org/abs/2510.09722">Technical Report</a>
|
| 21 |
+
</p>
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
## 项目介绍
|
| 25 |
+
|
| 26 |
+
SmartResume 是一个面向版面结构的智能简历解析系统,系统支持 PDF、图片及常见 Office 文档格式,融合 OCR 与 PDF 元数据完成文本提取,结合版面检测重建阅读顺序,并通过 LLM 将内容转换为结构化字段(如:基本信息、教育经历、工作经历等)。系统同时支持远程 API 和本地模型部署,提供灵活的使用方式。
|
| 27 |
+
<div align="center">
|
| 28 |
+
<img src="assets/image.png" alt="pipline">
|
| 29 |
+
</div>
|
| 30 |
+
|
| 31 |
+
## 基准测试
|
| 32 |
+
为了全面评估我们提出的框架,我们将其与一系列简历抽取基线进行比较,并在我们的流程中对大语言模型API进行基准测试。
|
| 33 |
+
<div align="center">
|
| 34 |
+
<img src="assets/results.jpg" alt="demo">
|
| 35 |
+
</div>
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
## 模型权重文件说明
|
| 39 |
+
|
| 40 |
+
本仓库包含 SmartResume 项目所需的两个核心权重文件,用于简历信息提取和版面分析。
|
| 41 |
+
|
| 42 |
+
### 1. Qwen3-0.6B 大语言模型
|
| 43 |
+
|
| 44 |
+
**用途**: 简历文本信息提取和结构化处理
|
| 45 |
+
**基础模型**: Qwen/Qwen3-0.6B
|
| 46 |
+
**模型类型**: 微调 (Instruction-tuned)
|
| 47 |
+
|
| 48 |
+
#### 目录结构
|
| 49 |
+
|
| 50 |
+
```
|
| 51 |
+
Qwen3-0.6B/
|
| 52 |
+
├── model.safetensors # 模型权重文件 (主要文件)
|
| 53 |
+
├── config.json # 模型配置文件
|
| 54 |
+
├── generation_config.json # 生成配置
|
| 55 |
+
├── tokenizer.json # 分词器主文件
|
| 56 |
+
├── tokenizer_config.json # 分词器配置
|
| 57 |
+
├── vocab.json # 词汇表
|
| 58 |
+
├── merges.txt # BPE合并规则
|
| 59 |
+
├── special_tokens_map.json # 特殊token映射
|
| 60 |
+
└── added_tokens.json # 额外添加的token
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
+
**功能特点**:
|
| 64 |
+
|
| 65 |
+
- 专门针对简历信息提取任务微调
|
| 66 |
+
- 能够提取基本信息、工作经历、教育背景等结构化信息
|
| 67 |
+
- 高精度、轻量级模型,推理速度快
|
| 68 |
+
|
| 69 |
+
### 2. YOLOv10 版面检测模型
|
| 70 |
+
|
| 71 |
+
**用途**: 简历版面布局检测和区域分割
|
| 72 |
+
**模型文件**: best.onnx (约 265.81 MB)
|
| 73 |
+
**任务类型**: 目标检测 (Object Detection)
|
| 74 |
+
|
| 75 |
+
#### 目录结构
|
| 76 |
+
|
| 77 |
+
```
|
| 78 |
+
yolov10/
|
| 79 |
+
└── best.onnx # YOLOv10 训练好的权重文件
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
**功能特点**:
|
| 83 |
+
|
| 84 |
+
- 支持多种版面布局识别
|
| 85 |
+
- 高精度区域定位
|
| 86 |
+
- 为文本提取提供准确的区域信息
|
| 87 |
+
|
| 88 |
+
## 使用方式
|
| 89 |
+
|
| 90 |
+
#### 您可以通过如下git clone命令,或者ModelScope SDK来下载模型
|
| 91 |
+
|
| 92 |
+
SDK下载
|
| 93 |
+
|
| 94 |
+
```bash
|
| 95 |
+
#安装ModelScope
|
| 96 |
+
pip install modelscope
|
| 97 |
+
```
|
| 98 |
+
|
| 99 |
+
```python
|
| 100 |
+
#SDK模型下载
|
| 101 |
+
from modelscope import snapshot_download
|
| 102 |
+
model_dir = snapshot_download('Alibaba_EI/SmartResume')
|
| 103 |
+
```
|
| 104 |
+
|
| 105 |
+
Git下载
|
| 106 |
+
|
| 107 |
+
```
|
| 108 |
+
#Git模型下载
|
| 109 |
+
git clone https://www.modelscope.cn/Alibaba_EI/SmartResume.git
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
+
## Citation
|
| 113 |
+
|
| 114 |
+
```bibtex
|
| 115 |
+
@article{Zhu2025SmartResume,
|
| 116 |
+
title={Layout-Aware Parsing Meets Efficient LLMs: A Unified, Scalable Framework for Resume Information Extraction and Evaluation},
|
| 117 |
+
author={Fanwei Zhu and Jinke Yu and Zulong Chen and Ying Zhou and Junhao Ji and Zhibo Yang and Yuxue Zhang and Haoyuan Hu and Zhenghao Liu},
|
| 118 |
+
journal={arXiv preprint arXiv:2510.09722},
|
| 119 |
+
year={2025},
|
| 120 |
+
url={https://arxiv.org/abs/2510.09722}
|
| 121 |
+
}
|
| 122 |
+
```
|
| 123 |
+
|
| 124 |
+
<p style="color: lightgrey;">如果您是本模型的贡献者,我们邀请您根据<a href="https://modelscope.cn/docs/ModelScope%E6%A8%A1%E5%9E%8B%E6%8E%A5%E5%85%A5%E6%B5%81%E7%A8%8B%E6%A6%82%E8%A7%88" style="color: lightgrey; text-decoration: underline;">模型贡献文档</a>,及时完善模型卡片内容。</p>
|