The CodeGen architecture follows a standard transformer decoder with left-to-right causal masking. With rotary position embedding for the positional encoding (Su et al., 2021), and a context length of 2048. CodeGen models are trained in various sizes.
You can load the model and tokenizer directly from 🤗 transformers
:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained('Salesforce/codegen-16B-mono')
model = AutoModelForCausalLM.from_pretrained('Salesforce/codegen-16B-mono')
inputs = tokenizer("def hello_world():", return_tensors="pt")
outputs = model(**inputs)