JayThinkDiff
/

CRE-1.1

+---
+license: apache-2.0
+---
+---
+pipeline_tag: sentence-similarity
+tags:
+- sentence-transformers
+- feature-extraction
+- sentence-similarity
+- transformers
+- qwen
+- recruitment
+- LoRA
+base_model:
+- Qwen/Qwen3-Embedding-8B
+---
+# CRE v1.1: CareerInternational Recruitment Embedding Model 🚀
+> **CRE v1.1** 是一款基于大语言模型（LLM-based）的招聘领域适配嵌入模型。相较于传统 BERT 类模型，它通过长上下文融合与指令控制，展现出极强的语义表征优势，完美解决岗位描述（JD）与简历（CV）之间的异构文本对齐难题。
+---
+### 📖 技术背景 (Technical Report Summary)
+**2025/06/28 Released the CRE v1.1 model and technical report.** 本研究探究了 LLM-based Embedding 模型在招聘语义匹配任务中的领域适配机制。核心研究结论证明了：
+1. **适配训练范式的有效性**：采用 **LoRA 轻量微调** 结合 **领域合成数据**，显著提升了模型在 JD2JD、JD2CV、CV2CV 三类核心匹配任务上的性能。
+2. **技术演进的新趋势**：LLM-based Embedding 天然支持多粒度语义解析（如技能上下位关系捕捉），有效规避了传统模型的结构性瓶颈。
+3. **工业部署价值**：在训练阶段使用**增强查询构造**（Enhanced Query Construction）、测试阶段直接应用原始查询的设定下，模型表现出极强的鲁棒性与实用性。
+---
+### 核心特性 (Key Features)
+* **领域适配方案 (Domain Adaptation)**: 以 **LoRA + 合成数据** 为核心，为复杂招聘场景的工程落地提供了一条高效率、低成本的可靠路径。
+* **异构文本对齐 (Heterogeneous Alignment)**: 针对 JD 与简历之间存在的信息不对称、表达习惯差异，具备极佳的语义映射能力。
+* **多粒度语义解析**: 能够捕捉技能间的层级与演进关系，支持更精准的人岗匹配。
+* **高鲁棒性设计**: 验证了在训练与测试 Query 形式不完全一致的情况下，模型性能依然稳定。
+---
+### 快速上手 (Quick Start)
+#### 1. 环境依赖
+```bash
+pip install "transformers>=4.51.0" "sentence-transformers>=2.7.0"
+### Using Sentence-Transformers
+```python
+from sentence_transformers import SentenceTransformer
+model = SentenceTransformer("JayThinkDiff/CRE-1.1")
+queries = [
+    "我们需要一名具备大型分布式系统开发经验的 Java 专家",
+    "寻找精通 PyTorch 和大模型微调的算法工程师",
+]
+documents = [
+    "候选人 A：8 年 Java 开发经验，曾主导某大厂金融级分布式中间件研发，精通 Spring Cloud。",
+    "候选人 B：计算机硕士，研究方向为自然语言处理，熟练使用 PyTorch 进行 LLM 的 LoRA 微调。",
+]
+query_embeddings = model.encode(queries)
+document_embeddings = model.encode(documents)
+# 计算余弦相似度
+similarity = model.similarity(query_embeddings, document_embeddings)
+print(similarity)
+```
+### 🛠️ 技术规格 (Technical Specifications)
+* **Pooling Strategy**: 推荐使用模型默认的表征方式（通常为末尾 Token 或 CLS）。
+* **Inference Optimization**: 处理长文本时，强烈建议开启 `flash_attention_2` 并设置 `padding_side="left"`。
+* **Task Support**: 针对招聘领域的 JD2JD、JD2CV、CV2CV 等任务进行了深度优化。
+---
+### 📜 Citation
+If you find this research and model helpful in your recruitment matching tasks, please cite our technical report:
+```text
+CRE v1.1 Team. (2025). Domain Adaptation Mechanism of LLM-based Embedding Models in Recruitment Semantic Matching. CareerInternational AI Lab Technical Report.