Update README.md
Browse files
README.md
CHANGED
|
@@ -9,6 +9,13 @@ tags:
|
|
| 9 |
---
|
| 10 |
# Introduction
|
| 11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
C2LLM: Advanced Code Embeddings for Deep Semantic Understanding
|
| 13 |
|
| 14 |
**C2LLMs (Code Contrastive Large Language Model)** is a powerful new model for generating code embeddings, designed to capture the deep semantics of source code.
|
|
@@ -19,7 +26,7 @@ C2LLM: Advanced Code Embeddings for Deep Semantic Understanding
|
|
| 19 |
- **Intelligent Pooling with PMA**: Instead of traditional `mean pooling` or `last token pooling`, C2LLM uses **PMA (Pooling by Multi-head Attention)**. This allows the model to dynamically focus on the most critical parts of the code, creating a more informative and robust embedding.
|
| 20 |
- **Trained for Retrieval**: C2LLM was fine-tuned on a massive dataset of **3 million query-document pairs**, optimizing it for real-world code retrieval and semantic search tasks. Supporting Text2Code/Code2Code/Code2Text tasks.
|
| 21 |
|
| 22 |
-
C2LLM is designed to be a go-to model for tasks like code search and Retrieval-Augmented Generation (RAG).
|
| 23 |
|
| 24 |
# Model Details
|
| 25 |
|
|
@@ -128,6 +135,12 @@ cache = ResultCache("./c2llm_results")
|
|
| 128 |
results = mteb.evaluate(model, tasks=tasks, cache=cache, encode_kwargs={"batch_size": 16})
|
| 129 |
```
|
| 130 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 131 |
## Correspondence to
|
| 132 |
|
| 133 |
Jin Qin (qj431428@antgroup.com), Zihan Liao (liaozihan.lzh@antgroup.com), Ziyin Zhang (zhangziying.zzy@antgroup.com), Hang Yu (hyu.hugo@antgroup.com), Peng Di (dipeng.dp@antgroup.com)
|
|
|
|
| 9 |
---
|
| 10 |
# Introduction
|
| 11 |
|
| 12 |
+
<h1 align="center">
|
| 13 |
+
<a href="https://github.com/codefuse-ai/CodeFuse-Embeddings/tree/main/">
|
| 14 |
+
<img src="https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png" width="30" height="30" style="vertical-align: middle; margin-right: 8px;">
|
| 15 |
+
CodeFuse-Embeddings
|
| 16 |
+
</a> |
|
| 17 |
+
</h1>
|
| 18 |
+
|
| 19 |
C2LLM: Advanced Code Embeddings for Deep Semantic Understanding
|
| 20 |
|
| 21 |
**C2LLMs (Code Contrastive Large Language Model)** is a powerful new model for generating code embeddings, designed to capture the deep semantics of source code.
|
|
|
|
| 26 |
- **Intelligent Pooling with PMA**: Instead of traditional `mean pooling` or `last token pooling`, C2LLM uses **PMA (Pooling by Multi-head Attention)**. This allows the model to dynamically focus on the most critical parts of the code, creating a more informative and robust embedding.
|
| 27 |
- **Trained for Retrieval**: C2LLM was fine-tuned on a massive dataset of **3 million query-document pairs**, optimizing it for real-world code retrieval and semantic search tasks. Supporting Text2Code/Code2Code/Code2Text tasks.
|
| 28 |
|
| 29 |
+
C2LLM is designed to be a go-to model for tasks like code search and Retrieval-Augmented Generation (RAG). For more details, please see our [GitHub repository](https://github.com/codefuse-ai/CodeFuse-Embeddings/tree/main).
|
| 30 |
|
| 31 |
# Model Details
|
| 32 |
|
|
|
|
| 135 |
results = mteb.evaluate(model, tasks=tasks, cache=cache, encode_kwargs={"batch_size": 16})
|
| 136 |
```
|
| 137 |
|
| 138 |
+
## Support Us
|
| 139 |
+
|
| 140 |
+
If you find this project helpful, please give it a star. It means a lot to us!
|
| 141 |
+
|
| 142 |
+
[](https://github.com/codefuse-ai/CodeFuse-Embeddings/tree/main)
|
| 143 |
+
|
| 144 |
## Correspondence to
|
| 145 |
|
| 146 |
Jin Qin (qj431428@antgroup.com), Zihan Liao (liaozihan.lzh@antgroup.com), Ziyin Zhang (zhangziying.zzy@antgroup.com), Hang Yu (hyu.hugo@antgroup.com), Peng Di (dipeng.dp@antgroup.com)
|