Sentence Similarity
sentence-transformers
Safetensors
Transformers
qwen3
text-generation
feature-extraction
qwen
recruitment
LoRA
text-embeddings-inference
Instructions to use JayThinkDiff/CRE-1.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use JayThinkDiff/CRE-1.1 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("JayThinkDiff/CRE-1.1") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use JayThinkDiff/CRE-1.1 with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("JayThinkDiff/CRE-1.1") model = AutoModelForCausalLM.from_pretrained("JayThinkDiff/CRE-1.1") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,84 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
---
|
| 4 |
+
---
|
| 5 |
+
pipeline_tag: sentence-similarity
|
| 6 |
+
tags:
|
| 7 |
+
- sentence-transformers
|
| 8 |
+
- feature-extraction
|
| 9 |
+
- sentence-similarity
|
| 10 |
+
- transformers
|
| 11 |
+
- qwen
|
| 12 |
+
- recruitment
|
| 13 |
+
- LoRA
|
| 14 |
+
base_model:
|
| 15 |
+
- Qwen/Qwen3-Embedding-8B
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
# CRE v1.1: CareerInternational Recruitment Embedding Model 🚀
|
| 19 |
+
|
| 20 |
+
> **CRE v1.1** 是一款基于大语言模型(LLM-based)的招聘领域适配嵌入模型。相较于传统 BERT 类模型,它通过长上下文融合与指令控制,展现出极强的语义表征优势,完美解决岗位描述(JD)与简历(CV)之间的异构文本对齐难题。
|
| 21 |
+
|
| 22 |
+
---
|
| 23 |
+
|
| 24 |
+
### 📖 技术背景 (Technical Report Summary)
|
| 25 |
+
|
| 26 |
+
**2025/06/28 Released the CRE v1.1 model and technical report.** 本研究探究了 LLM-based Embedding 模型在招聘语义匹配任务中的领域适配机制。核心研究结论证明了:
|
| 27 |
+
|
| 28 |
+
1. **适配训练范式的有效性**:采用 **LoRA 轻量微调** 结合 **领域合成数据**,显著提升了模型在 JD2JD、JD2CV、CV2CV 三类核心匹配任务上的性能。
|
| 29 |
+
2. **技术演进的新趋势**:LLM-based Embedding 天然支持多粒度语义解析(如技能上下位关系捕捉),有效规避了传统模型的结构性瓶颈。
|
| 30 |
+
3. **工业部署价值**:在训练阶段使用**增强查询构造**(Enhanced Query Construction)、测试阶段直接应用原始查询的设定下,模型表现出极强的鲁棒性与实用性。
|
| 31 |
+
|
| 32 |
+
---
|
| 33 |
+
|
| 34 |
+
### 核心特性 (Key Features)
|
| 35 |
+
|
| 36 |
+
* **领域适配方案 (Domain Adaptation)**: 以 **LoRA + 合成数据** 为核心,为复杂招聘场景的工程落地提供了一条高效率、低成本的可靠路径。
|
| 37 |
+
* **异构文本对齐 (Heterogeneous Alignment)**: 针对 JD 与简历之间存在的信息不对称、表达习惯差异,具备极佳的语义映射能力。
|
| 38 |
+
* **多粒度语义解析**: 能够捕捉技能间的层级与演进关系,支持更精准的人岗匹配。
|
| 39 |
+
* **高鲁棒性设计**: 验证了在训练与测试 Query 形式不完全一致的情况下,模型性能依然稳定。
|
| 40 |
+
|
| 41 |
+
---
|
| 42 |
+
|
| 43 |
+
### 快速上手 (Quick Start)
|
| 44 |
+
|
| 45 |
+
#### 1. 环境依赖
|
| 46 |
+
```bash
|
| 47 |
+
pip install "transformers>=4.51.0" "sentence-transformers>=2.7.0"
|
| 48 |
+
### Using Sentence-Transformers
|
| 49 |
+
|
| 50 |
+
```python
|
| 51 |
+
from sentence_transformers import SentenceTransformer
|
| 52 |
+
|
| 53 |
+
model = SentenceTransformer("JayThinkDiff/CRE-1.1")
|
| 54 |
+
|
| 55 |
+
queries = [
|
| 56 |
+
"我们需要一名具备大型分布式系统开发经验的 Java 专家",
|
| 57 |
+
"寻找精通 PyTorch 和大模型微调的算法工程师",
|
| 58 |
+
]
|
| 59 |
+
documents = [
|
| 60 |
+
"候选人 A:8 年 Java 开发经验,曾主导某大厂金融级分布式中间件研发,精通 Spring Cloud。",
|
| 61 |
+
"候选人 B:计算机硕士,研究方向为自然语言处理,熟练使用 PyTorch 进行 LLM 的 LoRA 微调。",
|
| 62 |
+
]
|
| 63 |
+
|
| 64 |
+
query_embeddings = model.encode(queries)
|
| 65 |
+
document_embeddings = model.encode(documents)
|
| 66 |
+
|
| 67 |
+
# 计算余弦相似度
|
| 68 |
+
similarity = model.similarity(query_embeddings, document_embeddings)
|
| 69 |
+
print(similarity)
|
| 70 |
+
```
|
| 71 |
+
### 🛠️ 技术规格 (Technical Specifications)
|
| 72 |
+
|
| 73 |
+
* **Pooling Strategy**: 推荐使用模型默认的表征方式(通常为末尾 Token 或 CLS)。
|
| 74 |
+
* **Inference Optimization**: 处理长文本时,强烈建议开启 `flash_attention_2` 并设置 `padding_side="left"`。
|
| 75 |
+
* **Task Support**: 针对招聘领域的 JD2JD、JD2CV、CV2CV 等任务进行了深度优化。
|
| 76 |
+
|
| 77 |
+
---
|
| 78 |
+
|
| 79 |
+
### 📜 Citation
|
| 80 |
+
|
| 81 |
+
If you find this research and model helpful in your recruitment matching tasks, please cite our technical report:
|
| 82 |
+
|
| 83 |
+
```text
|
| 84 |
+
CRE v1.1 Team. (2025). Domain Adaptation Mechanism of LLM-based Embedding Models in Recruitment Semantic Matching. CareerInternational AI Lab Technical Report.
|