|
--- |
|
base_model: |
|
- Qwen/Qwen2.5-Coder-1.5B |
|
license: cc-by-nc-4.0 |
|
tags: |
|
- feature-extraction |
|
- mteb |
|
- sentence-transformers |
|
- text-embeddings-inference |
|
inference: false |
|
library_name: transformers |
|
pipeline_tag: feature-extraction |
|
--- |
|
|
|
<br><br> |
|
|
|
<p align="center"> |
|
<img src="https://huggingface.co/datasets/jinaai/documentation-images/resolve/main/logo.webp" alt="Jina AI: Your Search Foundation, Supercharged!" width="150px"> |
|
</p> |
|
|
|
<p align="center"> |
|
<b>The code embedding model trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b> |
|
</p> |
|
|
|
# Jina Code Embeddings: A Small but Performant Code Embedding Model |
|
|
|
## Intended Usage & Model Info |
|
`jina-code-embeddings` is an embedding model for code retrieval. |
|
The model supports various types of code retrieval (text-to-code, code-to-code, code-to-text, code-to-completion) and technical question answering across 15+ programming languages. |
|
|
|
|
|
Built on [Qwen/Qwen2.5-Coder-1.5B](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B), `jina-code-embeddings-1.5b` features: |
|
|
|
- **Multilingual support** (15+ programming languages) and compatibility with a wide range of domains, including web development, software development, machine learning, data science, and educational coding problems. |
|
- **Task-specific instruction prefixes** for NL2Code, Code2Code, Code2NL, Code2Completion, and Technical QA, which can be selected at inference time. |
|
- **Flexible embedding size**: dense embeddings are 1536-dimensional by default but can be truncated to as low as 128 with minimal performance loss. |
|
|
|
|
|
Summary of features: |
|
|
|
| Feature | Jina Code Embeddings 1.5B | |
|
|------------|------------| |
|
| Base Model | Qwen2.5-Coder-1.5B | |
|
| Supported Tasks | `nl2code`, `code2code`, `code2nl`, `code2completion`, `qa` | |
|
| Model DType | BFloat 16 | |
|
| Max Sequence Length | 32768 | |
|
| Embedding Vector Dimension | 1536 | |
|
| Matryoshka dimensions | 128, 256, 512, 1024, 1536 | |
|
| Pooling Strategy | Last-token pooling | |
|
| Attention Mechanism | FlashAttention2 | |
|
|
|
## Usage |
|
|
|
<details> |
|
<summary>Requirements</a></summary> |
|
|
|
The following Python packages are required: |
|
|
|
- `transformers>=4.53.0` |
|
- `torch>=2.7.1` |
|
|
|
### Optional / Recommended |
|
- **flash-attention**: Installing [flash-attention](https://github.com/Dao-AILab/flash-attention) is recommended for improved inference speed and efficiency, but not mandatory. |
|
- **sentence-transformers**: If you want to use the model via the `sentence-transformers` interface, install this package as well. |
|
</details> |
|
|
|
<details> |
|
<summary>via <a href="https://huggingface.co/docs/transformers/en/index">transformers</a></summary> |
|
|
|
```python |
|
# !pip install transformers>=4.53.0 torch>=2.7.1 |
|
|
|
import torch |
|
import torch.nn.functional as F |
|
|
|
from transformers import AutoModel, AutoTokenizer |
|
|
|
INSTRUCTION_CONFIG = { |
|
"nl2code": { |
|
"query": "Find the most relevant code snippet given the following query:\n", |
|
"passage": "Candidate code snippet:\n" |
|
}, |
|
"qa": { |
|
"query": "Find the most relevant answer given the following question:\n", |
|
"passage": "Candidate answer:\n" |
|
}, |
|
"code2code": { |
|
"query": "Find an equivalent code snippet given the following code snippet:\n", |
|
"passage": "Candidate code snippet:\n" |
|
}, |
|
"code2nl": { |
|
"query": "Find the most relevant comment given the following code snippet:\n", |
|
"passage": "Candidate comment:\n" |
|
}, |
|
"code2completion": { |
|
"query": "Find the most relevant completion given the following start of code snippet:\n", |
|
"passage": "Candidate completion:\n" |
|
} |
|
} |
|
|
|
MAX_LENGTH = 8192 |
|
|
|
def cosine_similarity(x,y): |
|
x = F.normalize(x, p=2, dim=1) |
|
y = F.normalize(y, p=2, dim=1) |
|
return x @ y.T |
|
|
|
def last_token_pool(last_hidden_states, attention_mask): |
|
left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0]) |
|
if left_padding: |
|
return last_hidden_states[:, -1] |
|
else: |
|
sequence_lengths = attention_mask.sum(dim=1) - 1 |
|
batch_size = last_hidden_states.shape[0] |
|
return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths] |
|
|
|
def add_instruction(instruction, query): |
|
return f'{instruction}{query}' |
|
|
|
# The queries and documents to embed |
|
queries = [ |
|
add_instruction(INSTRUCTION_CONFIG["nl2code"]["query"], "print hello world in python"), |
|
add_instruction(INSTRUCTION_CONFIG["nl2code"]["query"], "initialize array of 5 zeros in c++") |
|
] |
|
documents = [ |
|
add_instruction(INSTRUCTION_CONFIG["nl2code"]["passage"], "print('Hello World!')"), |
|
add_instruction(INSTRUCTION_CONFIG["nl2code"]["passage"], "int arr[5] = {0, 0, 0, 0, 0};") |
|
] |
|
all_inputs = queries + documents |
|
|
|
tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-code-embeddings-1.5b') |
|
model = AutoModel.from_pretrained('jinaai/jina-code-embeddings-1.5b') |
|
|
|
batch_dict = tokenizer( |
|
all_inputs, |
|
padding=True, |
|
truncation=True, |
|
max_length=MAX_LENGTH, |
|
return_tensors="pt", |
|
) |
|
batch_dict.to(model.device) |
|
outputs = model(**batch_dict) |
|
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask']) |
|
query_embeddings = embeddings[:2] |
|
passage_embeddings = embeddings[2:] |
|
|
|
# Compute the (cosine) similarity between the query and document embeddings |
|
scores = cosine_similarity(query_embeddings, passage_embeddings) |
|
print(scores) |
|
# tensor([[0.7647, 0.1115], |
|
# [0.0930, 0.6606]], grad_fn=<MmBackward0>) |
|
``` |
|
</details> |
|
|
|
<details> |
|
<summary>via <a href="https://sbert.net/">sentence-transformers</a></summary> |
|
|
|
```python |
|
# !pip install sentence_transformers>=5.0.0 torch>=2.7.1 |
|
|
|
import torch |
|
from sentence_transformers import SentenceTransformer |
|
|
|
# Load the model |
|
model = SentenceTransformer( |
|
"jinaai/jina-code-embeddings-1.5b", |
|
model_kwargs={ |
|
"torch_dtype": torch.bfloat16, |
|
"attn_implementation": "flash_attention_2", |
|
"device_map": "cuda" |
|
}, |
|
tokenizer_kwargs={"padding_side": "left"}, |
|
) |
|
|
|
# The queries and documents to embed |
|
queries = [ |
|
"print hello world in python", |
|
"initialize array of 5 zeros in c++" |
|
] |
|
documents = [ |
|
"print('Hello World!')", |
|
"int arr[5] = {0, 0, 0, 0, 0};" |
|
] |
|
|
|
query_embeddings = model.encode(queries, prompt_name="nl2code_query") |
|
document_embeddings = model.encode(documents, prompt_name="nl2code_document") |
|
|
|
# Compute the (cosine) similarity between the query and document embeddings |
|
similarity = model.similarity(query_embeddings, document_embeddings) |
|
print(similarity) |
|
# tensor([[0.7670, 0.1117], |
|
# [0.0938, 0.6607]]) |
|
``` |
|
</details> |
|
|
|
<details> |
|
<summary>via <a href="https://github.com/vllm-project/vllm">vLLM</a></summary> |
|
|
|
```python |
|
|
|
import torch |
|
import torch.nn.functional as F |
|
from vllm import LLM |
|
|
|
INSTRUCTION_CONFIG = { |
|
"nl2code": { |
|
"query": "Find the most relevant code snippet given the following query:\n", |
|
"passage": "Candidate code snippet:\n" |
|
}, |
|
"qa": { |
|
"query": "Find the most relevant answer given the following question:\n", |
|
"passage": "Candidate answer:\n" |
|
}, |
|
"code2code": { |
|
"query": "Find an equivalent code snippet given the following code snippet:\n", |
|
"passage": "Candidate code snippet:\n" |
|
}, |
|
"code2nl": { |
|
"query": "Find the most relevant comment given the following code snippet:\n", |
|
"passage": "Candidate comment:\n" |
|
}, |
|
"code2completion": { |
|
"query": "Find the most relevant completion given the following start of code snippet:\n", |
|
"passage": "Candidate completion:\n" |
|
} |
|
} |
|
|
|
def add_instruction(instruction, text): |
|
return f"{instruction}{text}" |
|
|
|
def cosine_similarity(x, y): |
|
x = F.normalize(x, p=2, dim=1) |
|
y = F.normalize(y, p=2, dim=1) |
|
return x @ y.T |
|
|
|
# Build the queries and documents |
|
queries = [ |
|
add_instruction(INSTRUCTION_CONFIG["nl2code"]["query"], "print hello world in python"), |
|
add_instruction(INSTRUCTION_CONFIG["nl2code"]["query"], "initialize array of 5 zeros in c++"), |
|
] |
|
documents = [ |
|
add_instruction(INSTRUCTION_CONFIG["nl2code"]["passage"], "print('Hello World!')"), |
|
add_instruction(INSTRUCTION_CONFIG["nl2code"]["passage"], "int arr[5] = {0, 0, 0, 0, 0};"), |
|
] |
|
all_inputs = queries + documents |
|
|
|
# vLLM embedding model |
|
llm = LLM( |
|
model="jinaai/jina-code-embeddings-1.5b", |
|
task="embed" |
|
) |
|
|
|
# Encode with vLLM |
|
outputs = llm.encode(all_inputs) |
|
|
|
# Collect embeddings into a single tensor |
|
emb_list = [] |
|
for out in outputs: |
|
vec = out.outputs.data.detach() |
|
emb_list.append(vec) |
|
embeddings = torch.stack(emb_list, dim=0) |
|
|
|
# Split into query and passage embeddings |
|
n_q = len(queries) |
|
query_embeddings = embeddings[:n_q] |
|
passage_embeddings = embeddings[n_q:] |
|
|
|
# Cosine similarity matrix (queries x documents) |
|
scores = cosine_similarity(query_embeddings, passage_embeddings) |
|
print(scores) |
|
# tensor([[0.7650, 0.1118], |
|
# [0.0937, 0.6613]]) |
|
``` |
|
|
|
</details> |
|
|
|
## Citation |
|
|
|
Please refer to our [technical report of jina-code-embeddings](https://arxiv.org/abs/2508.21290) for training details and benchmarks. If you find it useful in your research, please cite the following paper: |
|
|
|
``` |
|
@misc{kryvosheieva2025efficientcodeembeddingscode, |
|
title={Efficient Code Embeddings from Code Generation Models}, |
|
author={Daria Kryvosheieva and Saba Sturua and Michael Günther and Scott Martens and Han Xiao}, |
|
year={2025}, |
|
eprint={2508.21290}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2508.21290}, |
|
} |
|
``` |
|
|
|
## Contact |
|
|
|
Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas. |