Feature Extraction
Transformers
Safetensors
sentence-transformers
Chinese
English
c2llm
code
custom_code
Instructions to use codefuse-ai/C2LLM-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use codefuse-ai/C2LLM-7B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="codefuse-ai/C2LLM-7B", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("codefuse-ai/C2LLM-7B", trust_remote_code=True, dtype="auto") - sentence-transformers
How to use codefuse-ai/C2LLM-7B with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("codefuse-ai/C2LLM-7B", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -9,6 +9,13 @@ tags:
|
|
| 9 |
---
|
| 10 |
# Introduction
|
| 11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
C2LLM: Advanced Code Embeddings for Deep Semantic Understanding
|
| 13 |
|
| 14 |
**C2LLMs (Code Contrastive Large Language Model)** is a powerful new model for generating code embeddings, designed to capture the deep semantics of source code.
|
|
@@ -19,7 +26,7 @@ C2LLM: Advanced Code Embeddings for Deep Semantic Understanding
|
|
| 19 |
- **Intelligent Pooling with PMA**: Instead of traditional `mean pooling` or `last token pooling`, C2LLM uses **PMA (Pooling by Multi-head Attention)**. This allows the model to dynamically focus on the most critical parts of the code, creating a more informative and robust embedding.
|
| 20 |
- **Trained for Retrieval**: C2LLM was fine-tuned on a massive dataset of **3 million query-document pairs**, optimizing it for real-world code retrieval and semantic search tasks. Supporting Text2Code/Code2Code/Code2Text tasks.
|
| 21 |
|
| 22 |
-
C2LLM is designed to be a go-to model for tasks like code search and Retrieval-Augmented Generation (RAG).
|
| 23 |
|
| 24 |
# Model Details
|
| 25 |
|
|
@@ -128,6 +135,12 @@ cache = ResultCache("./c2llm_results")
|
|
| 128 |
results = mteb.evaluate(model, tasks=tasks, cache=cache, encode_kwargs={"batch_size": 16})
|
| 129 |
```
|
| 130 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 131 |
## Correspondence to
|
| 132 |
|
| 133 |
Jin Qin (qj431428@antgroup.com), Zihan Liao (liaozihan.lzh@antgroup.com), Ziyin Zhang (zhangziying.zzy@antgroup.com), Hang Yu (hyu.hugo@antgroup.com), Peng Di (dipeng.dp@antgroup.com)
|
|
|
|
| 9 |
---
|
| 10 |
# Introduction
|
| 11 |
|
| 12 |
+
<h1 align="center">
|
| 13 |
+
<a href="https://github.com/codefuse-ai/CodeFuse-Embeddings/tree/main/">
|
| 14 |
+
<img src="https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png" width="30" height="30" style="vertical-align: middle; margin-right: 8px;">
|
| 15 |
+
CodeFuse-Embeddings
|
| 16 |
+
</a> |
|
| 17 |
+
</h1>
|
| 18 |
+
|
| 19 |
C2LLM: Advanced Code Embeddings for Deep Semantic Understanding
|
| 20 |
|
| 21 |
**C2LLMs (Code Contrastive Large Language Model)** is a powerful new model for generating code embeddings, designed to capture the deep semantics of source code.
|
|
|
|
| 26 |
- **Intelligent Pooling with PMA**: Instead of traditional `mean pooling` or `last token pooling`, C2LLM uses **PMA (Pooling by Multi-head Attention)**. This allows the model to dynamically focus on the most critical parts of the code, creating a more informative and robust embedding.
|
| 27 |
- **Trained for Retrieval**: C2LLM was fine-tuned on a massive dataset of **3 million query-document pairs**, optimizing it for real-world code retrieval and semantic search tasks. Supporting Text2Code/Code2Code/Code2Text tasks.
|
| 28 |
|
| 29 |
+
C2LLM is designed to be a go-to model for tasks like code search and Retrieval-Augmented Generation (RAG). For more details, please see our [GitHub repository](https://github.com/codefuse-ai/CodeFuse-Embeddings/tree/main).
|
| 30 |
|
| 31 |
# Model Details
|
| 32 |
|
|
|
|
| 135 |
results = mteb.evaluate(model, tasks=tasks, cache=cache, encode_kwargs={"batch_size": 16})
|
| 136 |
```
|
| 137 |
|
| 138 |
+
## Support Us
|
| 139 |
+
|
| 140 |
+
If you find this project helpful, please give it a star. It means a lot to us!
|
| 141 |
+
|
| 142 |
+
[](https://github.com/codefuse-ai/CodeFuse-Embeddings/tree/main)
|
| 143 |
+
|
| 144 |
## Correspondence to
|
| 145 |
|
| 146 |
Jin Qin (qj431428@antgroup.com), Zihan Liao (liaozihan.lzh@antgroup.com), Ziyin Zhang (zhangziying.zzy@antgroup.com), Hang Yu (hyu.hugo@antgroup.com), Peng Di (dipeng.dp@antgroup.com)
|