Spaces:

Ahmadzei
/

RAG

Runtime error

update 1

57bdca5 over 1 year ago

783 Bytes

	The values in these tensors depend on the language used and are identified by the tokenizer's lang2id and id2lang attributes.
	In this example, load the FacebookAI/xlm-clm-enfr-1024 checkpoint (Causal language modeling, English-French):

	import torch
	from transformers import XLMTokenizer, XLMWithLMHeadModel
	tokenizer = XLMTokenizer.from_pretrained("FacebookAI/xlm-clm-enfr-1024")
	model = XLMWithLMHeadModel.from_pretrained("FacebookAI/xlm-clm-enfr-1024")

	The lang2id attribute of the tokenizer displays this model's languages and their ids:

	print(tokenizer.lang2id)
	{'en': 0, 'fr': 1}

	Next, create an example input:

	input_ids = torch.tensor([tokenizer.encode("Wikipedia was used to")]) # batch size of 1

	Set the language id as "en" and use it to define the language embedding.