Update README.md

9cbdec3 verified 6 days ago

9.57 kB

	---
	base_model:
	- Qwen/Qwen2.5-Coder-1.5B
	license: cc-by-nc-4.0
	tags:
	- feature-extraction
	- mteb
	- sentence-transformers
	- text-embeddings-inference
	inference: false
	library_name: transformers
	pipeline_tag: feature-extraction
	---

	<br><br>

	<p align="center">
	<img src="https://huggingface.co/datasets/jinaai/documentation-images/resolve/main/logo.webp" alt="Jina AI: Your Search Foundation, Supercharged!" width="150px">
	</p>

	<p align="center">
	<b>The code embedding model trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
	</p>

	# Jina Code Embeddings: A Small but Performant Code Embedding Model

	## Intended Usage & Model Info
	`jina-code-embeddings` is an embedding model for code retrieval.
	The model supports various types of code retrieval (text-to-code, code-to-code, code-to-text, code-to-completion) and technical question answering across 15+ programming languages.


	Built on [Qwen/Qwen2.5-Coder-1.5B](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B), `jina-code-embeddings-1.5b` features:

	- Multilingual support (15+ programming languages) and compatibility with a wide range of domains, including web development, software development, machine learning, data science, and educational coding problems.
	- Task-specific instruction prefixes for NL2Code, Code2Code, Code2NL, Code2Completion, and Technical QA, which can be selected at inference time.
	- Flexible embedding size: dense embeddings are 1536-dimensional by default but can be truncated to as low as 128 with minimal performance loss.


	Summary of features:

	\| Feature \| Jina Code Embeddings 1.5B \|
	\|------------\|------------\|
	\| Base Model \| Qwen2.5-Coder-1.5B \|
	\| Supported Tasks \| `nl2code`, `code2code`, `code2nl`, `code2completion`, `qa` \|
	\| Model DType \| BFloat 16 \|
	\| Max Sequence Length \| 32768 \|
	\| Embedding Vector Dimension \| 1536 \|
	\| Matryoshka dimensions \| 128, 256, 512, 1024, 1536 \|
	\| Pooling Strategy \| Last-token pooling \|
	\| Attention Mechanism \| FlashAttention2 \|

	## Usage

	<details>
	<summary>Requirements</a></summary>

	The following Python packages are required:

	- `transformers>=4.53.0`
	- `torch>=2.7.1`

	### Optional / Recommended
	- flash-attention: Installing [flash-attention](https://github.com/Dao-AILab/flash-attention) is recommended for improved inference speed and efficiency, but not mandatory.
	- sentence-transformers: If you want to use the model via the `sentence-transformers` interface, install this package as well.
	</details>

	<details>
	<summary>via <a href="https://huggingface.co/docs/transformers/en/index">transformers</a></summary>

	```python
	# !pip install transformers>=4.53.0 torch>=2.7.1

	import torch
	import torch.nn.functional as F

	from transformers import AutoModel, AutoTokenizer

	INSTRUCTION_CONFIG = {
	"nl2code": {
	"query": "Find the most relevant code snippet given the following query:\n",
	"passage": "Candidate code snippet:\n"
	},
	"qa": {
	"query": "Find the most relevant answer given the following question:\n",
	"passage": "Candidate answer:\n"
	},
	"code2code": {
	"query": "Find an equivalent code snippet given the following code snippet:\n",
	"passage": "Candidate code snippet:\n"
	},
	"code2nl": {
	"query": "Find the most relevant comment given the following code snippet:\n",
	"passage": "Candidate comment:\n"
	},
	"code2completion": {
	"query": "Find the most relevant completion given the following start of code snippet:\n",
	"passage": "Candidate completion:\n"
	}
	}

	MAX_LENGTH = 8192

	def cosine_similarity(x,y):
	x = F.normalize(x, p=2, dim=1)
	y = F.normalize(y, p=2, dim=1)
	return x @ y.T

	def last_token_pool(last_hidden_states, attention_mask):
	left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
	if left_padding:
	return last_hidden_states[:, -1]
	else:
	sequence_lengths = attention_mask.sum(dim=1) - 1
	batch_size = last_hidden_states.shape[0]
	return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]

	def add_instruction(instruction, query):
	return f'{instruction}{query}'

	# The queries and documents to embed
	queries = [
	add_instruction(INSTRUCTION_CONFIG["nl2code"]["query"], "print hello world in python"),
	add_instruction(INSTRUCTION_CONFIG["nl2code"]["query"], "initialize array of 5 zeros in c++")
	]
	documents = [
	add_instruction(INSTRUCTION_CONFIG["nl2code"]["passage"], "print('Hello World!')"),
	add_instruction(INSTRUCTION_CONFIG["nl2code"]["passage"], "int arr[5] = {0, 0, 0, 0, 0};")
	]
	all_inputs = queries + documents

	tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-code-embeddings-1.5b')
	model = AutoModel.from_pretrained('jinaai/jina-code-embeddings-1.5b')

	batch_dict = tokenizer(
	all_inputs,
	padding=True,
	truncation=True,
	max_length=MAX_LENGTH,
	return_tensors="pt",
	)
	batch_dict.to(model.device)
	outputs = model(**batch_dict)
	embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
	query_embeddings = embeddings[:2]
	passage_embeddings = embeddings[2:]

	# Compute the (cosine) similarity between the query and document embeddings
	scores = cosine_similarity(query_embeddings, passage_embeddings)
	print(scores)
	# tensor([[0.7647, 0.1115],
	# [0.0930, 0.6606]], grad_fn=<MmBackward0>)
	```
	</details>

	<details>
	<summary>via <a href="https://sbert.net/">sentence-transformers</a></summary>

	```python
	# !pip install sentence_transformers>=5.0.0 torch>=2.7.1

	import torch
	from sentence_transformers import SentenceTransformer

	# Load the model
	model = SentenceTransformer(
	"jinaai/jina-code-embeddings-1.5b",
	model_kwargs={
	"torch_dtype": torch.bfloat16,
	"attn_implementation": "flash_attention_2",
	"device_map": "cuda"
	},
	tokenizer_kwargs={"padding_side": "left"},
	)

	# The queries and documents to embed
	queries = [
	"print hello world in python",
	"initialize array of 5 zeros in c++"
	]
	documents = [
	"print('Hello World!')",
	"int arr[5] = {0, 0, 0, 0, 0};"
	]

	query_embeddings = model.encode(queries, prompt_name="nl2code_query")
	document_embeddings = model.encode(documents, prompt_name="nl2code_document")

	# Compute the (cosine) similarity between the query and document embeddings
	similarity = model.similarity(query_embeddings, document_embeddings)
	print(similarity)
	# tensor([[0.7670, 0.1117],
	# [0.0938, 0.6607]])
	```
	</details>

	<details>
	<summary>via <a href="https://github.com/vllm-project/vllm">vLLM</a></summary>

	```python

	import torch
	import torch.nn.functional as F
	from vllm import LLM

	INSTRUCTION_CONFIG = {
	"nl2code": {
	"query": "Find the most relevant code snippet given the following query:\n",
	"passage": "Candidate code snippet:\n"
	},
	"qa": {
	"query": "Find the most relevant answer given the following question:\n",
	"passage": "Candidate answer:\n"
	},
	"code2code": {
	"query": "Find an equivalent code snippet given the following code snippet:\n",
	"passage": "Candidate code snippet:\n"
	},
	"code2nl": {
	"query": "Find the most relevant comment given the following code snippet:\n",
	"passage": "Candidate comment:\n"
	},
	"code2completion": {
	"query": "Find the most relevant completion given the following start of code snippet:\n",
	"passage": "Candidate completion:\n"
	}
	}

	def add_instruction(instruction, text):
	return f"{instruction}{text}"

	def cosine_similarity(x, y):
	x = F.normalize(x, p=2, dim=1)
	y = F.normalize(y, p=2, dim=1)
	return x @ y.T

	# Build the queries and documents
	queries = [
	add_instruction(INSTRUCTION_CONFIG["nl2code"]["query"], "print hello world in python"),
	add_instruction(INSTRUCTION_CONFIG["nl2code"]["query"], "initialize array of 5 zeros in c++"),
	]
	documents = [
	add_instruction(INSTRUCTION_CONFIG["nl2code"]["passage"], "print('Hello World!')"),
	add_instruction(INSTRUCTION_CONFIG["nl2code"]["passage"], "int arr[5] = {0, 0, 0, 0, 0};"),
	]
	all_inputs = queries + documents

	# vLLM embedding model
	llm = LLM(
	model="jinaai/jina-code-embeddings-1.5b",
	task="embed"
	)

	# Encode with vLLM
	outputs = llm.encode(all_inputs)

	# Collect embeddings into a single tensor
	emb_list = []
	for out in outputs:
	vec = out.outputs.data.detach()
	emb_list.append(vec)
	embeddings = torch.stack(emb_list, dim=0)

	# Split into query and passage embeddings
	n_q = len(queries)
	query_embeddings = embeddings[:n_q]
	passage_embeddings = embeddings[n_q:]

	# Cosine similarity matrix (queries x documents)
	scores = cosine_similarity(query_embeddings, passage_embeddings)
	print(scores)
	# tensor([[0.7650, 0.1118],
	# [0.0937, 0.6613]])
	```

	</details>

	## Citation

	Please refer to our [technical report of jina-code-embeddings](https://arxiv.org/abs/2508.21290) for training details and benchmarks. If you find it useful in your research, please cite the following paper:

	```
	@misc{kryvosheieva2025efficientcodeembeddingscode,
	title={Efficient Code Embeddings from Code Generation Models},
	author={Daria Kryvosheieva and Saba Sturua and Michael Günther and Scott Martens and Han Xiao},
	year={2025},
	eprint={2508.21290},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2508.21290},
	}
	```

	## Contact

	Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.