Update README.md
Browse files
README.md
CHANGED
|
@@ -7,9 +7,17 @@ library_name: transformers
|
|
| 7 |
tags:
|
| 8 |
- code
|
| 9 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
# Introduction
|
| 11 |
|
| 12 |
-
C2LLM: Advanced Code Embeddings for Deep Semantic Understanding
|
| 13 |
|
| 14 |
**C2LLMs (Code Contrastive Large Language Model)** is a powerful new model for generating code embeddings, designed to capture the deep semantics of source code.
|
| 15 |
|
|
@@ -19,7 +27,7 @@ C2LLM: Advanced Code Embeddings for Deep Semantic Understanding
|
|
| 19 |
- **Intelligent Pooling with PMA**: Instead of traditional `mean pooling` or `last token pooling`, C2LLM uses **PMA (Pooling by Multi-head Attention)**. This allows the model to dynamically focus on the most critical parts of the code, creating a more informative and robust embedding.
|
| 20 |
- **Trained for Retrieval**: C2LLM was fine-tuned on a massive dataset of **3 million query-document pairs**, optimizing it for real-world code retrieval and semantic search tasks. Supporting Text2Code/Code2Code/Code2Text tasks.
|
| 21 |
|
| 22 |
-
C2LLM is designed to be a go-to model for tasks like code search and Retrieval-Augmented Generation (RAG).
|
| 23 |
|
| 24 |
# Model Details
|
| 25 |
|
|
@@ -128,6 +136,13 @@ cache = ResultCache("./c2llm_results")
|
|
| 128 |
results = mteb.evaluate(model, tasks=tasks, cache=cache, encode_kwargs={"batch_size": 16})
|
| 129 |
```
|
| 130 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 131 |
## Correspondence to
|
| 132 |
|
| 133 |
Jin Qin (qj431428@antgroup.com), Zihan Liao (liaozihan.lzh@antgroup.com), Ziyin Zhang (zhangziying.zzy@antgroup.com), Hang Yu (hyu.hugo@antgroup.com), Peng Di (dipeng.dp@antgroup.com)
|
|
|
|
| 7 |
tags:
|
| 8 |
- code
|
| 9 |
---
|
| 10 |
+
|
| 11 |
+
<div align="center" style="display: flex; justify-content: center; align-items: center; gap: 20px;">
|
| 12 |
+
<a href="https://github.com/codefuse-ai/CodeFuse-Embeddings/tree/main/" style="display: flex; align-items: center; text-decoration: none; color: inherit;">
|
| 13 |
+
<img src="https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png" width="30" height="30" style="vertical-align: middle; margin-right: 8px;">
|
| 14 |
+
<span style="font-size: 1.5em; font-weight: bold;">CodeFuse-Embeddings</span>
|
| 15 |
+
</a>
|
| 16 |
+
</div>
|
| 17 |
+
|
| 18 |
# Introduction
|
| 19 |
|
| 20 |
+
## C2LLM: Advanced Code Embeddings for Deep Semantic Understanding
|
| 21 |
|
| 22 |
**C2LLMs (Code Contrastive Large Language Model)** is a powerful new model for generating code embeddings, designed to capture the deep semantics of source code.
|
| 23 |
|
|
|
|
| 27 |
- **Intelligent Pooling with PMA**: Instead of traditional `mean pooling` or `last token pooling`, C2LLM uses **PMA (Pooling by Multi-head Attention)**. This allows the model to dynamically focus on the most critical parts of the code, creating a more informative and robust embedding.
|
| 28 |
- **Trained for Retrieval**: C2LLM was fine-tuned on a massive dataset of **3 million query-document pairs**, optimizing it for real-world code retrieval and semantic search tasks. Supporting Text2Code/Code2Code/Code2Text tasks.
|
| 29 |
|
| 30 |
+
C2LLM is designed to be a go-to model for tasks like code search and Retrieval-Augmented Generation (RAG). For more details, please see our [GitHub repository](https://github.com/codefuse-ai/CodeFuse-Embeddings/tree/main).
|
| 31 |
|
| 32 |
# Model Details
|
| 33 |
|
|
|
|
| 136 |
results = mteb.evaluate(model, tasks=tasks, cache=cache, encode_kwargs={"batch_size": 16})
|
| 137 |
```
|
| 138 |
|
| 139 |
+
## Support Us
|
| 140 |
+
|
| 141 |
+
If you find this project helpful, please give it a star. It means a lot to us!
|
| 142 |
+
|
| 143 |
+
[](https://github.com/codefuse-ai/CodeFuse-Embeddings/tree/main)
|
| 144 |
+
|
| 145 |
+
|
| 146 |
## Correspondence to
|
| 147 |
|
| 148 |
Jin Qin (qj431428@antgroup.com), Zihan Liao (liaozihan.lzh@antgroup.com), Ziyin Zhang (zhangziying.zzy@antgroup.com), Hang Yu (hyu.hugo@antgroup.com), Peng Di (dipeng.dp@antgroup.com)
|