Embedding handeling question

by Helios888 - opened Jun 27, 2024

Jun 27, 2024

Hello,

I am not sure I understand this

# extract residue embeddings for the first ([0,:]) sequence in the batch and remove padded & special tokens, incl. prefix ([0,1:8]) 
emb_0 = embedding_repr_train.last_hidden_state[0,1:8] # shape (7 x 1024)

Does it mean that when skipping the first token, both the cls token and the added prefix embeddings are skipped?
If so I still don't understand, I believed each letter was a token, and the prefix ("" in my case) is constituted of multiple letter.

To extract all relevent embeddings of the first sequence of the batch, skipping unwanted tokens, is it enough to remove padding tokens after running this line?

emb_0 = embedding_repr_train.last_hidden_state[0,1:]

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment