Instructions to use Rostlab/ProstT5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Rostlab/ProstT5 with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="Rostlab/ProstT5")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("Rostlab/ProstT5") model = AutoModelForSeq2SeqLM.from_pretrained("Rostlab/ProstT5") - Notebooks
- Google Colab
- Kaggle
Embedding handeling question
#3
by Helios888 - opened
Hello,
I am not sure I understand this
# extract residue embeddings for the first ([0,:]) sequence in the batch and remove padded & special tokens, incl. prefix ([0,1:8])
emb_0 = embedding_repr_train.last_hidden_state[0,1:8] # shape (7 x 1024)
Does it mean that when skipping the first token, both the cls token and the added prefix embeddings are skipped?
If so I still don't understand, I believed each letter was a token, and the prefix ("" in my case) is constituted of multiple letter.
To extract all relevent embeddings of the first sequence of the batch, skipping unwanted tokens, is it enough to remove padding tokens after running this line?
emb_0 = embedding_repr_train.last_hidden_state[0,1:]