Sentence Similarity
sentence-transformers
Safetensors
Transformers
qwen3
text-generation
feature-extraction
qwen
recruitment
LoRA
text-embeddings-inference
Instructions to use JayThinkDiff/CRE-1.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use JayThinkDiff/CRE-1.1 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("JayThinkDiff/CRE-1.1") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use JayThinkDiff/CRE-1.1 with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("JayThinkDiff/CRE-1.1") model = AutoModelForCausalLM.from_pretrained("JayThinkDiff/CRE-1.1") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -15,20 +15,22 @@ base_model:
|
|
| 15 |
- Qwen/Qwen3-Embedding-8B
|
| 16 |
---
|
| 17 |
|
| 18 |
-
# CRE
|
| 19 |
|
| 20 |
-
> **CRE
|
| 21 |
|
| 22 |
---
|
| 23 |
|
| 24 |
-
###
|
|
|
|
|
|
|
| 25 |
|
| 26 |
-
|
| 27 |
|
|
|
|
| 28 |
1. **适配训练范式的有效性**:采用 **LoRA 轻量微调** 结合 **领域合成数据**,显著提升了模型在 JD2JD、JD2CV、CV2CV 三类核心匹配任务上的性能。
|
| 29 |
2. **技术演进的新趋势**:LLM-based Embedding 天然支持多粒度语义解析(如技能上下位关系捕捉),有效规避了传统模型的结构性瓶颈。
|
| 30 |
3. **工业部署价值**:在训练阶段使用**增强查询构造**(Enhanced Query Construction)、测试阶段直接应用原始查询的设定下,模型表现出极强的鲁棒性与实用性。
|
| 31 |
-
|
| 32 |
---
|
| 33 |
|
| 34 |
### 核心特性 (Key Features)
|
|
@@ -40,13 +42,7 @@ base_model:
|
|
| 40 |
|
| 41 |
---
|
| 42 |
|
| 43 |
-
### 快速上手 (Quick Start)
|
| 44 |
-
|
| 45 |
-
#### 1. 环境依赖
|
| 46 |
-
```bash
|
| 47 |
-
pip install "transformers>=4.51.0" "sentence-transformers>=2.7.0"
|
| 48 |
### Using Sentence-Transformers
|
| 49 |
-
|
| 50 |
```python
|
| 51 |
from sentence_transformers import SentenceTransformer, util
|
| 52 |
|
|
@@ -62,22 +58,14 @@ print("查询结果:", util.cos_sim(query_embedding, passage_embedding))
|
|
| 62 |
```
|
| 63 |
### 📊 预期结果对比 (Expected Output Comparison)
|
| 64 |
|
| 65 |
-
| 模型名称 (Model)
|
| 66 |
-
| :---
|
| 67 |
-
| **CRE-1.1**
|
| 68 |
-
| **Qwen3-Embedding-8B** | **0.7731**
|
| 69 |
|
| 70 |
### 🛠️ 技术规格 (Technical Specifications)
|
| 71 |
|
| 72 |
-
* **Pooling Strategy**: 推荐使用模型默认的表征方式(
|
| 73 |
-
* **Inference Optimization**: 处理长文本时,强烈建议开启 `flash_attention_2` 并设置 `padding_side="left"`。
|
| 74 |
* **Task Support**: 针对招聘领域的 JD2JD、JD2CV、CV2CV 等任务进行了深度优化。
|
| 75 |
|
| 76 |
-
---
|
| 77 |
-
|
| 78 |
-
### 📜 Citation
|
| 79 |
-
|
| 80 |
-
If you find this research and model helpful in your recruitment matching tasks, please cite our technical report:
|
| 81 |
-
|
| 82 |
-
```text
|
| 83 |
-
CRE v1.1 Team. (2025). Domain Adaptation Mechanism of LLM-based Embedding Models in Recruitment Semantic Matching. CareerInternational AI Lab Technical Report.
|
|
|
|
| 15 |
- Qwen/Qwen3-Embedding-8B
|
| 16 |
---
|
| 17 |
|
| 18 |
+
# CRE: CareerInternational Recruitment Embedding Model 🚀
|
| 19 |
|
| 20 |
+
> **CRE-1.1** 是一款基于大语言模型(LLM-based)的招聘领域适配嵌入模型。相较于传统 BERT 类模型,它通过长上下文融合与指令控制,展现出极强的语义表征优势,优化了岗位描述(JD)与简历(CV)之间的异构文本对齐难题。
|
| 21 |
|
| 22 |
---
|
| 23 |
|
| 24 |
+
### 更新日志 (Release Notes)
|
| 25 |
+
* **2026/06/28**: 发布 **CRE-1.1**,优化长文本特征提取与推理性能。
|
| 26 |
+
* **2025/03/28**: 发布 **CRE-0.5** 初始版本及技术报告。
|
| 27 |
|
| 28 |
+
### 📖 技术背景 (Technical Report Summary)
|
| 29 |
|
| 30 |
+
本研究探究了 LLM-based Embedding 模型在招聘语义匹配任务中的领域适配机制。核心研究结论证明了:
|
| 31 |
1. **适配训练范式的有效性**:采用 **LoRA 轻量微调** 结合 **领域合成数据**,显著提升了模型在 JD2JD、JD2CV、CV2CV 三类核心匹配任务上的性能。
|
| 32 |
2. **技术演进的新趋势**:LLM-based Embedding 天然支持多粒度语义解析(如技能上下位关系捕捉),有效规避了传统模型的结构性瓶颈。
|
| 33 |
3. **工业部署价值**:在训练阶段使用**增强查询构造**(Enhanced Query Construction)、测试阶段直接应用原始查询的设定下,模型表现出极强的鲁棒性与实用性。
|
|
|
|
| 34 |
---
|
| 35 |
|
| 36 |
### 核心特性 (Key Features)
|
|
|
|
| 42 |
|
| 43 |
---
|
| 44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
### Using Sentence-Transformers
|
|
|
|
| 46 |
```python
|
| 47 |
from sentence_transformers import SentenceTransformer, util
|
| 48 |
|
|
|
|
| 58 |
```
|
| 59 |
### 📊 预期结果对比 (Expected Output Comparison)
|
| 60 |
|
| 61 |
+
| 模型名称 (Model) | 相似度 1 (与简历 1) | 相似度 2 (与简历 2) |
|
| 62 |
+
| :--- | :---: | :---: |
|
| 63 |
+
| **CRE-1.1** | 0.5816 | **0.6093** |
|
| 64 |
+
| **Qwen3-Embedding-8B** | **0.7731** | 0.7638 |
|
| 65 |
|
| 66 |
### 🛠️ 技术规格 (Technical Specifications)
|
| 67 |
|
| 68 |
+
* **Pooling Strategy**: 推荐使用模型默认的表征方式(last token pooling)。
|
|
|
|
| 69 |
* **Task Support**: 针对招聘领域的 JD2JD、JD2CV、CV2CV 等任务进行了深度优化。
|
| 70 |
|
| 71 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|