Automatically add EOS via Tokenizer, integrate Sentence Transformers

by tomaarsen HF Staff - opened 3 days ago

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

+94

-13

tomaarsen

3 days ago

•

edited 3 days ago

Hello!

Preface

I already discussed a lot of these changes over on the MTEB project with @izhx here. He'll already have a good understanding of the changes here.

Pull Request overview

Update the tokenizer to automatically add a EOS token. I ran @izhx and my code here: https://github.com/embeddings-benchmark/mteb/pull/2769#issuecomment-2944905730 to add a TemplateProcessing post-processor to the tokenizer that adds the <|endoftext|> on which we perform pooling.
Updated the transformers Usage snippet accordingly - it's simpler now, but still gives the same results (feel free to compare)
Add the Sentence Transformers configuration files. This model already fits the mold that Sentence Transformers supports, so all we need is some configuration files.
Added a simple usage script via Sentence Transformers (note: some third parties like LangChain and LlamaIndex also use Sentence Transformers, so those'll work too)
Add some tags to the model card to make this model easier to find with filtering etc.

How to try this PR?

You can run the following to try this out:

Run this PR with Sentence Transformers

# Requires transformers>=4.51.0

from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B", revision="refs/pr/2")

# We recommend enabling flash_attention_2 for better acceleration and memory saving,
# together with setting `padding_side` to "left":
# model = SentenceTransformer(
#     "Qwen/Qwen3-Embedding-0.6B",
#     revision="refs/pr/2",
#     model_kwargs={"attn_implementation": "flash_attention_2", "device_map": "auto"},
#     tokenizer_kwargs={"padding_side": "left"},
# )

# The queries and documents to embed
queries = [
    "What is the capital of China?",
    "Explain gravity",
]
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]

# Encode the queries and documents. Note that queries benefit from using a prompt
# Here we use the prompt called "query" stored under `model.prompts`, but you can
# also pass your own prompt via the `prompt` argument
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)
# tensor([[0.7646, 0.1414],
#         [0.1355, 0.6000]])

(Note the revision argument)

Run this PR with Transformers

import torch
import torch.nn.functional as F

from torch import Tensor
from transformers import AutoTokenizer, AutoModel


def last_token_pool(last_hidden_states: Tensor,
                 attention_mask: Tensor) -> Tensor:
    left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
    if left_padding:
        return last_hidden_states[:, -1]
    else:
        sequence_lengths = attention_mask.sum(dim=1) - 1
        batch_size = last_hidden_states.shape[0]
        return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]


def get_detailed_instruct(task_description: str, query: str) -> str:
    return f'Instruct: {task_description}\nQuery:{query}'

# Each query must come with a one-sentence instruction that describes the task
task = 'Given a web search query, retrieve relevant passages that answer the query'

queries = [
    get_detailed_instruct(task, 'What is the capital of China?'),
    get_detailed_instruct(task, 'Explain gravity')
]
# No need to add instruction for retrieval documents
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
input_texts = queries + documents

tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen3-Embedding-0.6B', padding_side='left', revision="refs/pr/2")
model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-0.6B', revision="refs/pr/2")

# We recommend enabling flash_attention_2 for better acceleration and memory saving.
# model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-0.6B', attn_implementation="flash_attention_2", torch_dtype=torch.float16, revision="refs/pr/2").cuda()

eod_id = tokenizer.convert_tokens_to_ids("<|endoftext|>")
max_length = 8192

# Tokenize the input texts
batch_dict = tokenizer(
    input_texts,
    padding=True,
    truncation=True,
    max_length=max_length,
    return_tensors="pt",
)
batch_dict.to(model.device)
outputs = model(**batch_dict)
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])

# normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:2] @ embeddings[2:].T)
print(scores.tolist())

Question

With this PR made, I can make identical changes to the 4B and 8B model. Please let me know if you would welcome that!

cc @littlebird13 @JustinLin610 @izhx

Tom Aarsen

Automatically add EOS via Tokenizer, integrate Sentence Transformers4ee1aa53

tomaarsen changed pull request status to open 3 days ago

Add "device_map": "auto" to automatically move the model to CUDA if possible2f6ecfd6

Xenova

3 days ago

You can also remove the eod_id line in the README :)

tomaarsen

2 days ago

Good call!

Remove eod_id line from README26538339

izhx

2 days ago

•

edited 2 days ago

I checked, the outputs are consistent!

It also helps vLLM to produce the correct outputs. 🤣

Checking other code..

noooop9527

2 days ago

littlebird13 changed pull request status to merged 2 days ago

zyznull

Qwen org 2 days ago

@tomaarsen May I ask which version of the sentence-transformers library supports qwen3-embedding? It would be helpful to include this information in the README.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment