codefuse-ai
/

C2LLM-0.5B

@@ -7,9 +7,17 @@ library_name: transformers
 tags:
 - code
 ---
 # Introduction
-C2LLM: Advanced Code Embeddings for Deep Semantic Understanding
 **C2LLMs (Code Contrastive Large Language Model)** is a powerful new model for generating code embeddings, designed to capture the deep semantics of source code.
@@ -19,7 +27,7 @@ C2LLM: Advanced Code Embeddings for Deep Semantic Understanding
 - **Intelligent Pooling with PMA**: Instead of traditional `mean pooling` or `last token pooling`, C2LLM uses **PMA (Pooling by Multi-head Attention)**. This allows the model to dynamically focus on the most critical parts of the code, creating a more informative and robust embedding.
 - **Trained for Retrieval**: C2LLM was fine-tuned on a massive dataset of **3 million query-document pairs**, optimizing it for real-world code retrieval and semantic search tasks. Supporting Text2Code/Code2Code/Code2Text tasks.
-C2LLM is designed to be a go-to model for tasks like code search and Retrieval-Augmented Generation (RAG).
 #  Model Details
@@ -128,6 +136,13 @@ cache = ResultCache("./c2llm_results")
 results = mteb.evaluate(model, tasks=tasks, cache=cache, encode_kwargs={"batch_size": 16})
 ```
 ## Correspondence to
 Jin Qin (qj431428@antgroup.com), Zihan Liao (liaozihan.lzh@antgroup.com), Ziyin Zhang (zhangziying.zzy@antgroup.com), Hang Yu (hyu.hugo@antgroup.com), Peng Di (dipeng.dp@antgroup.com)

 tags:
 - code
 ---
+<div align="center" style="display: flex; justify-content: center; align-items: center; gap: 20px;">
+    <a href="https://github.com/codefuse-ai/CodeFuse-Embeddings/tree/main/" style="display: flex; align-items: center; text-decoration: none; color: inherit;">
+        <img src="https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png" width="30" height="30" style="vertical-align: middle; margin-right: 8px;">
+        <span style="font-size: 1.5em; font-weight: bold;">CodeFuse-Embeddings</span>
+    </a>
+</div>
 # Introduction
+## C2LLM: Advanced Code Embeddings for Deep Semantic Understanding
 **C2LLMs (Code Contrastive Large Language Model)** is a powerful new model for generating code embeddings, designed to capture the deep semantics of source code.
 - **Intelligent Pooling with PMA**: Instead of traditional `mean pooling` or `last token pooling`, C2LLM uses **PMA (Pooling by Multi-head Attention)**. This allows the model to dynamically focus on the most critical parts of the code, creating a more informative and robust embedding.
 - **Trained for Retrieval**: C2LLM was fine-tuned on a massive dataset of **3 million query-document pairs**, optimizing it for real-world code retrieval and semantic search tasks. Supporting Text2Code/Code2Code/Code2Text tasks.
+C2LLM is designed to be a go-to model for tasks like code search and Retrieval-Augmented Generation (RAG). For more details, please see our [GitHub repository](https://github.com/codefuse-ai/CodeFuse-Embeddings/tree/main).
 #  Model Details
 results = mteb.evaluate(model, tasks=tasks, cache=cache, encode_kwargs={"batch_size": 16})
 ```
+## Support Us
+If you find this project helpful, please give it a star. It means a lot to us!
+[![GitHub stars](https://img.shields.io/github/stars/codefuse-ai/CodeFuse-Embeddings?style=social)](https://github.com/codefuse-ai/CodeFuse-Embeddings/tree/main)
 ## Correspondence to
 Jin Qin (qj431428@antgroup.com), Zihan Liao (liaozihan.lzh@antgroup.com), Ziyin Zhang (zhangziying.zzy@antgroup.com), Hang Yu (hyu.hugo@antgroup.com), Peng Di (dipeng.dp@antgroup.com)