Update README.md
Browse files
README.md
CHANGED
|
@@ -14,14 +14,14 @@ license: cc-by-nc-4.0
|
|
| 14 |
<b>The code embedding model trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
|
| 15 |
</p>
|
| 16 |
|
| 17 |
-
# Jina Embeddings
|
| 18 |
|
| 19 |
## Intended Usage & Model Info
|
| 20 |
-
`jina-embeddings
|
| 21 |
The model supports various types of code retrieval (text-to-code, code-to-code, code-to-text, code-to-completion) and technical question answering across 15+ programming languages.
|
| 22 |
|
| 23 |
|
| 24 |
-
Built on [Qwen/Qwen2.5-Coder-0.5B](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B), `jina-embeddings-
|
| 25 |
|
| 26 |
- **Multilingual support** (15+ programming languages) and compatibility with a wide range of domains, including web development, software development, machine learning, data science, and educational coding problems.
|
| 27 |
- **Task-specific instruction prefixes** for NL2Code, Code2Code, Code2NL, Code2Completion, and Technical QA, which can be selected at inference time.
|
|
@@ -30,7 +30,7 @@ Built on [Qwen/Qwen2.5-Coder-0.5B](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5
|
|
| 30 |
|
| 31 |
Summary of features:
|
| 32 |
|
| 33 |
-
| Feature | Jina Embeddings
|
| 34 |
|------------|------------|
|
| 35 |
| Base Model | Qwen2.5-Coder-0.5B |
|
| 36 |
| Supported Tasks | `nl2code`, `code2code`, `code2nl`, `code2completion`, `qa` |
|
|
@@ -66,7 +66,7 @@ from transformers import AutoModel
|
|
| 66 |
import torch
|
| 67 |
|
| 68 |
# Initialize the model
|
| 69 |
-
model = AutoModel.from_pretrained("jinaai/jina-embeddings-
|
| 70 |
model.to("cuda")
|
| 71 |
|
| 72 |
# Configure truncate_dim, max_length, batch_size in the encode function if needed
|
|
@@ -98,7 +98,7 @@ from sentence_transformers import SentenceTransformer
|
|
| 98 |
|
| 99 |
# Load the model
|
| 100 |
model = SentenceTransformer(
|
| 101 |
-
"jinaai/jina-embeddings-
|
| 102 |
model_kwargs={
|
| 103 |
"torch_dtype": torch.bfloat16,
|
| 104 |
"attn_implementation": "flash_attention_2",
|
|
@@ -129,7 +129,7 @@ print(similarity)
|
|
| 129 |
|
| 130 |
## Training & Evaluation
|
| 131 |
|
| 132 |
-
Please refer to our technical report of jina-embeddings
|
| 133 |
|
| 134 |
## Contact
|
| 135 |
|
|
|
|
| 14 |
<b>The code embedding model trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
|
| 15 |
</p>
|
| 16 |
|
| 17 |
+
# Jina Code Embeddings: A Small but Performant Code Embedding Model
|
| 18 |
|
| 19 |
## Intended Usage & Model Info
|
| 20 |
+
`jina-code-embeddings` is an embedding model for code retrieval.
|
| 21 |
The model supports various types of code retrieval (text-to-code, code-to-code, code-to-text, code-to-completion) and technical question answering across 15+ programming languages.
|
| 22 |
|
| 23 |
|
| 24 |
+
Built on [Qwen/Qwen2.5-Coder-0.5B](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B), `jina-code-embeddings-0.5b` features:
|
| 25 |
|
| 26 |
- **Multilingual support** (15+ programming languages) and compatibility with a wide range of domains, including web development, software development, machine learning, data science, and educational coding problems.
|
| 27 |
- **Task-specific instruction prefixes** for NL2Code, Code2Code, Code2NL, Code2Completion, and Technical QA, which can be selected at inference time.
|
|
|
|
| 30 |
|
| 31 |
Summary of features:
|
| 32 |
|
| 33 |
+
| Feature | Jina Code Embeddings 0.5B |
|
| 34 |
|------------|------------|
|
| 35 |
| Base Model | Qwen2.5-Coder-0.5B |
|
| 36 |
| Supported Tasks | `nl2code`, `code2code`, `code2nl`, `code2completion`, `qa` |
|
|
|
|
| 66 |
import torch
|
| 67 |
|
| 68 |
# Initialize the model
|
| 69 |
+
model = AutoModel.from_pretrained("jinaai/jina-code-embeddings-0.5b", trust_remote_code=True)
|
| 70 |
model.to("cuda")
|
| 71 |
|
| 72 |
# Configure truncate_dim, max_length, batch_size in the encode function if needed
|
|
|
|
| 98 |
|
| 99 |
# Load the model
|
| 100 |
model = SentenceTransformer(
|
| 101 |
+
"jinaai/jina-code-embeddings-0.5b",
|
| 102 |
model_kwargs={
|
| 103 |
"torch_dtype": torch.bfloat16,
|
| 104 |
"attn_implementation": "flash_attention_2",
|
|
|
|
| 129 |
|
| 130 |
## Training & Evaluation
|
| 131 |
|
| 132 |
+
Please refer to our technical report of jina-code-embeddings for training details and benchmarks.
|
| 133 |
|
| 134 |
## Contact
|
| 135 |
|